lundi 26 août 2019

redCV and FFmpeg: Using pipes

As indicated in FFmpeg documentation, FFmpeg reads from an arbitrary number of input files (which can be regular files, pipes, network streams, grabbing devices, etc.), specified by the -i option, and writes to an arbitrary number of output files, which are specified by a plain output url.
A very intresting property of FFmepg is that we can use pipes inside the command. A pipe is a mechanism for interprocess communication; data written to the pipe by one process can be read by another process. The data is handled in a first-in, first-out (FIFO) order. The pipe has no name; it is created for one use and both ends of process must be inherited from the single process which created the pipe.
You can find on the Internet some very interesting examples, that are using pipes, for accessing audio and video data with FFmepg from 

Pipes with Red language

Actually, Red does not support pipe mechanism, but the problem can be solved with Red/System DSL, which provides low-level system programming capabilities. Basically, pipe mechanism is defined in the standard libc, and Red/System DSL knows how to communicate with libc. We have just to add a few functions (/lib/ffmpeg.reds):
In fact, only p-open and p-close are new. The other functions are defined by Red in red/system/runtime/libc.reds, but the idea is to let this file unchanged. This is why, p-read, p-write and p-flush are implemented in ffmpeg.reds. This also makes the code clearer.
The p-open function is closely related to the system function: It executes the shell command as a subprocess. However, instead of waiting for the command to complete, it creates a pipe to the subprocess and returns a stream that corresponds to that pipe. If you specify a r mode argument, you can read data from the stream. If you specify a w mode argument, you can write data to the stream.

Writing audio file with Red and FFmpeg

The idea is to launch FFmpeg via a pipe, which then converts pure raw samples to the required format for writing to the ouput file (see /pipe/sound.red).
This code is simple. First of all, we have to load the Red/System code to use new functions.
#system [ #include %../lib/ffmpeg.reds ]
Then, the generateSound function generates 1 second of sine wave audio data. Generated values are simply stored in a red vector! array of 16-bit integer values. All the job is then done by the makePipe routine with 2 parameters : command: a string with all required FFmpeg commands buf: the array containing the generated sound values. 

As usual with Red/System routines, command string is transformed as c-string! type in order to facilitate the interaction with C library. ptr is a byte-pointer which gives the starting address of the array of values, and n is the size of the buffer. Then, we call the p-open function. Here, we have to write sound values, and thus we use w mode:
pipeout: p-open cmd "w".
Then we just have to write the array into the stream, passing as arguments the pointer to the array of values, the size of each entry in the array (2 for 16-bit signed integer), the number of entries, and the stream:
p-write ptr 2 n pipeout.
Once the job is done, we close the subprocess:
p-close pipeout.
The main program is trivial, and only FFmpeg options passed to the p-open function need some explanation.
-y is used to overwrite the output file if it already exists.
-f s16le option tells FFmpeg that the format of the audio data is raw, signed integer, 32-bit and little-endian. You can use s16be for big-endian according to you OS.
-ar 44100 means that the sampling frequency of the audio data is 44.1 kHz.
-ac 1 is the number of channels in the signal. 
-i - 'beep.wav', the output filename FFmpeg will use.
Finally, the Red code calls ffplay to play the sound and display the result. Of course, since we use Red/System, the code must be compiled.

Modifying video file with Red and FFmpeg

Same technique can be used for video as illustrated in /pipe/video1.red. In this sample, we just want to invert image color using pipes.

The only difference with the previous example, is that we are using 2 subprocesses: one for reading the source data, and the other for writing the modified data.
For reading data:


For writing data:

Then, main program is really simple. Once the video is processed, we can also process sound channel for adding sound to the ouput file. Lastly, we display the result. 

Here is the result: source: 
and the transform: 

Some tips

This very important to know the size of the orginal movie before making transformations. This why you'll find here (/videos/mediainfo.red), a tool which can help you for retreiving information. Then, I am very found of Red vector data type for this kind of programming, since we can exactly choose the size of the data we need for the pipe process. Thanks to the Red Team :)

From movie to Red image

Here (/pipe/video2.red), the idea is get the data from FFmpeg to make a Red image! that can be displayed by a Red face. If the video has a size of 720x480 pixels, then the first 720x480x3 bytes outputed by FFMPEG will give the RGB values of the pixels of the first frame, line by line, top to bottom. The next 720x480x3 bytes after that will represent the second frame, etc. 
Before using a routine, we need a command-line for FFmpeg:

The format image2pipe and the - at the end signal to FFMPEG that it is being used with a pipe by another program. Then, the routine getImages transforms the FFmpeg data to a Red image! 

pixD: image/acquire-buffer rimage :handle creates pointer to get the data provided by FFmpeg. Then we read all FFmpeg data as rgb integer value and we update the image.
pixD/value: (255 << 24) OR (r << 16 ) OR (b << 8) OR g
When the all image is processed, we release the memory for the next frame image/release-buffer rimage handle yes, before calling 2 simple Red functions to control the delay between images and to display the result. If the movie contains an audio channel, the movie player plays the audio if required.

With this technique, images are not stored on disk, but just processed on-the-fly in memory, giving a very fast access to video movies with Red.

Attention: this code crashes sometimes and must be improved! In this case, kill all ffplay processes, and launch the program again. The origin of the problem is probably related to the use of #call.

All sources can be found here: https://github.com/ldci/ffmpeg/pipe

redCV and FFmpeg: Video reading

redCV and FFmpeg: Video reading

As previoulsy explained, Red and FFmpeg collborate rather well for video and audio recording. In this new post, we'll focus on video reading.

A simple approach: Use ffplay tool

A simple, but efficient way, is to call ffplay tool which is included in FFmpeg framework. Just open a terminal session, and give the name of the movie to read.
ffplay Wildlife.wmv


While running movie, different options can be used form controlling the ouput:
f: for fullscreen
m: toggle audio
p, space: pause/resume the movie
/, *: decrease and increase volume respectively
During the reading of the movie, ffplay returns a lot of information about the video:


Of course, this command can be integrated in Red code as parameter of call function, and GUI program can be developped to avoid command-line use (see /video/ffplay.red code). It is important to use /shell refinement for call, since ffplay uses the terminal to display different information:

 call/shell "ffplay Wildlife.wmv"


This simple approach is very confortable for reading and listening all supported video files and, this is the way I prefer for reading movies without using VLC or Quick Time Player.

A second approach: Extract frames from a movie

In many applications, I don't need tp process audio channel, but I have to focus on images in order to make redCV image processing on time-lapse videos. 
To turn a video to number of images, run the command below. The command generates the files named image0001.png, image0002.png and so on.
ffmpeg -i filename image0d.png
This command is included in video/movies.red program and, associated to Red code for creating an efficient video reader with a clean GUI interface. T

he code is very simple, but contains some interesting functions.
First of all, to create elegant navigation buttons, we profite the fact that Red supports unicode string :)


The second important function concerns the generation of line-command for FFmpeg.


FFmpeg options are very simple:
"/usr/local/bin/ffmpeg" ;location of ffmepg binary
" -y" ;automatically replace image files
"-f image2" ;The image file muxer writes video frames to image files 
" -s 720x480" ;output size
" -q:v 1" ;use fixed quality scale (1 to 31, 1 is highest)
" -r " frames per sec ;fps (mandatory for .vmw files)
" " dataDir ;temp destination directory"img-%05d.jpg" 
;automatic file name numbering
The rest of the code is pure Red and very easy to understand. First of all, you have to select a movie file. When done, FFmpeg is called to create, in a temprary directory, all jpg images corresponding to the number of frames contained in the movie. You can also play with the FPS to create more or less images. Navigation buttons, slider and, text-list faces can be used to directely access any image. When you press the Play/Stop button, for a complete reading, text-list face is disabled. 

redCV and FFmpeg: Video and audio capture

FFmpeg is a fabulous command-line framework for multimedia processing. Among variety of features, FFmpeg can capture video and audio from your computer's camera and microphone. Since Red language does support video capture, it was really easy to connect both programs and realize a nice Red video recorder. Red is used to display camera images on screen and, FFmpeg to record movie.

Wiith this Red code, you can select video and audio inputs, change the quality and size of the recorded video. You can also control the frequency of recorded frames (FPS).

Supported video files for recording are mpg, mp4, mkv, avi, wmv, and mov.

Before we start, you must have Red language (http://www.red-lang.org) and FFmpeg (https://www.ffmpeg.org/) installed on your computer and you must know the path of the FFmpeg binary such as /usr/local/bin/ffmpeg for Mac or Linux.

As Red, FFmpeg is cross-platform and can be used vith various operating systems. The Red getPlateform function is called to select the running OS and then to use the correct FFmpeg input device.


Then, the second operation is to generate the command-line that will be passed as parameter to FFmpeg binary. This is done by Red function generateCommands.


In the code above, a few of FFmpeg options and Red words are used:

-f inputDevice: to use the OS device for grabbing video (on macOS, we use the avfoundation device).
-framerate frameRate: the FPS (1..30) for recording.
-video_size videoSize: required video size (a pair WxH, on my MacBook Pro: "1280x720" or "640x480").
-i  vDevice:aDevice: the video and audio device used for recording. By default, vDevice = 0 corresponds to the first camera found on computer (e.g. Apple FaceTime Camera on macOS), and aDevice = 0 to computer microphone.  
-target target: this is a combination of 2 values for determining the quality of the video (e.g film_dvd). 
lastly the fileName is provided to store the video.

When FFmpeg command-line is generated, we just need Red call function to start or stop the movie grabbing.



In less than 150 lines of code, we get a very efficient movie grabber which records both video and audio channels.  

You'll find the code here: https://github.com/ldci/ffmpeg/video/camera.red. Enjoy:)

lundi 25 mars 2019

redCV and image denoising

RedCV can be used for image denoising. A lot of functions are included for helping image restoration. Basically a 3x3 kernel is used to calculate the pixel value of neighbors and to replace the pixel value in the image by the result. Of course, kernel size can be changed. According to the noise included in image you can use different parametric filters. 
When noise is simple such as pepper and salt noise a simple median filter will be efficient: Central pixel value is replaced by the median value of neighbors by rcvMedianFilter function.

But when the image is really noisy, median filter is not sufficiently efficient:
In this case *rcvMinFilter* function can be used: Central pixel value is replaced by the minimum value of neighbors and the result is pretty good. 
You can also use *rcvMaxFilter* (max value of neighbors) or *rcvMidPointFilter* (central pixel value is replaced by minimum+ maximum values of neighbors divided by 2 ) according to the noise contained in the image.
RedCV also includes a *rcvMeanFilter* function, which can be used for image smoothing. 

Code sample

As usual with Red and redCV, code is very clear and simple.
Red [
    Title: "Smoothing filters for image "
    Author: "Francois Jouen"
    File:    %smoothing.red
    Needs:   'View
]
#include %../../libs/redcv.red ; for redCV functions
kSize: 3x3
src: make image! 512x512
dst: make image! 512x512
isFile: false

loadImage: does [
    isFile: false
    canvas2/image: none
    tmp: request-file
    if not none? tmp [
        src: load tmp   
        dst: make image! src/size
        canvas1/image: src
        isFile: true
    ]
]
view win: layout [
    title "Red Smoothing Filters"
    origin 10x10 space 10x10
    button "Load"               [loadImage]
    text "Filter Size"
    field "3x3"                 [if error? try [kSize: to-pair face/text] [kSize: 3x3]]
    button "Median"             [if isFile [rcvMedianFilter src dst kSize canvas2/image: dst]]
    button "Min"                [if isFile [rcvMinFilter src dst kSize canvas2/image: dst]]
    button "Max"                [if isFile [rcvMaxFilter src dst kSize canvas2/image: dst]]
    button "MidPoint"           [if isFile [rcvMidPointFilter src dst kSize canvas2/image: dst]]
    button "Arithmetic Mean"    [if isFile [rcvMeanFilter src dst kSize 0 canvas2/image: dst]]
    button "Harmonic Mean"      [if isFile [rcvMeanFilter src dst kSize 1 canvas2/image: dst]]
    button "Geometric Mean"     [if isFile [rcvMeanFilter src dst kSize 2 canvas2/image: dst]]
    button "Quit"               [quit]
    return
    canvas1: base 512x512 white
    canvas2: base 512x512 white
]



jeudi 21 mars 2019

Deep Learning Text Recognition (OCR) using Tesseract and Red

I'm frequently using OCR Tesseract when I have to recognize text in images.
Tesseract was initially developed  by Hewlett Packard Labs. In 2005, it was open sourced by HP, and since 2006 it has been actively developed by Google and open source community.

In version 4, Tesseract has implemented a long short term memory (LSTM) recognition engine which is a kind of recurrent neural network (RNN) very efficient for OCR.

Tesseract library includes a command line tool tesseract which can be used  to perform OCR on images and output the result in a text file.

Install Tesseract

First of all you need to install Tesseract library accoding to your main OS such as sudo apt install tesseract-ocr for Linux or  brew install tesseract for macOS.

If you want multi-language support (about 170 languages) you have to install *tesseract-lang* package.


Using Tesseract with Red Language

This operation is really trivial since Red includes a call fonction which makes possible to use tesseract command line tool. You have to use call/wait refinment in order to wait tessearact execution. You can use different languages according to your documents.

Code Sample

Red [
Title:   "OCR"
Author:  "Francois Jouen"
File:  %tesseract.red
Needs:  View
icon:  %red.ico
]
; Languages
`tessdata: [
"afr (Afrikaans)"
"amh (Amharic)"
"ara (Arabic)"
"asm (Assamese)"
"aze (Azerbaijani)"
"aze_cyrl (Azerbaijani - Cyrilic)"
"bel (Belarusian)"
"ben (Bengali)"
"bod (Tibetan)"
"bos (Bosnian)"
"bre (Breton)"
"bul (Bulgarian)"
"cat (Catalan; Valencian)"
"ceb (Cebuano)"
"ces (Czech)"
"chi_sim (Chinese - Simplified)"
"chi_tra (Chinese - Traditional)"
"chr (Cherokee)"
"cym (Welsh)"
"dan (Danish)"
"deu (German)"
"dzo (Dzongkha)"
"ell (Greek Modern (1453-)"
"eng (English)"
"enm (English Middle (1100-1500)"
"epo (Esperanto)"
"equ (Math / equation detection module)"
"est (Estonian)"
"eus (Basque)"
"fas (Persian)"
"fin (Finnish)"
"fra (French)"
"frk (Frankish)"
"frm (French Middle (ca.1400-1600)"
"gle (Irish)"
"glg (Galician)"
"grc (Greek Ancient (to 1453)"
"guj (Gujarati)"
"hat (Haitian; Haitian Creole)"
"heb (Hebrew)"
"hin (Hindi)"
"hrv (Croatian)"
"hun (Hungarian)"
"iku (Inuktitut)"
"ind (Indonesian)"
"isl (Icelandic)"
"ita (Italian)"
"ita_old (Italian - Old)"
"jav (Javanese)"
"jpn (Japanese)"
"kan (Kannada)"
"kat (Georgian)"
"kat_old (Georgian - Old)"
"kaz (Kazakh)"
"khm (Central Khmer)"
"kir (Kirghiz; Kyrgyz)"
"kor (Korean)"
"kor_vert (Korean (vertical)"
"kur (Kurdish)"
"kur_ara (Kurdish (Arabic)"
"lao (Lao)"
"lat (Latin)"
"lav (Latvian)"
"lit (Lithuanian)"
"ltz (Luxembourgish)"
"mal (Malayalam)"
"mar (Marathi)"
"mkd (Macedonian)"
"mlt (Maltese)"
"mon (Mongolian)"
"mri (Maori)"
"msa (Malay)"
"mya (Burmese)"
"nep (Nepali)"
"nld (Dutch; Flemish)"
"nor (Norwegian)"
"oci (Occitan (post 1500)"
"ori (Oriya)"
"osd (Orientation and script detection module)"
"pan (Panjabi; Punjabi)"
"pol (Polish)"
"por (Portuguese)"
"pus (Pushto; Pashto)"
"que (Quechua)"
"ron (Romanian; Moldavian; Moldovan)"
"rus (Russian)"
"san (Sanskrit)"
"sin (Sinhala; Sinhalese)"
"slk (Slovak)"
"slv (Slovenian)"
"snd (Sindhi)"
"spa (Spanish; Castilian)"
"spa_old (Spanish; Castilian - Old)"
"sqi (Albanian)"
"srp (Serbian)"
"srp_latn (Serbian - Latin)"
"sun (Sundanese)"
"swa (Swahili)"
"swe (Swedish)"
"syr (Syriac)"
"tam (Tamil)"
"tat (Tatar)"
"tel (Telugu)"
"tgk (Tajik)"
"tgl (Tagalog)"
"tha (Thai)"
"tir (Tigrinya)"
"ton (Tonga)"
"tur (Turkish)"
"uig (Uighur; Uyghur)"
"ukr (Ukrainian)"
"urd (Urdu)"
"uzb (Uzbek)"
"uzb_cyrl (Uzbek - Cyrilic)"
"vie (Vietnamese)"
"yid (Yiddish)"
"yor (Yoruba)"
]
;OCR Engine Mode
ocr: [
"Original Tesseract only"
"Neural nets LSTM only"
"Tesseract + LSTM"
"Default, based on what is available"
]



appDir: "Please adapt or use what-dir"
appDir: what-dir
tFile: to-file rejoin[appDir "tempo"]
tFileExt: to-file rejoin[appDir "tempo.txt"]
change-dir to-file appDir

dSize: 512
gsize: as-pair dSize dSize
img: make image! reduce [gSize black]
lang: "eng"
ocrMode: 3
tmpf: none
tBuffer: copy []

loadImage: does [
tmpf: request-file
isFile: false
if not none? tmpf [
clear result/text
img: load tmpf
canvas/image: img
isFile: true
]
]

processFile: does [
if isFile [
if exists? tFileExt [delete tFileExt]
clear result/text 
prog: copy "/usr/local/bin/tesseract " 
append prog form tmpf 
append append prog " " form tFile
case [
ocrMode = 0 [append append prog " -l " lang]
ocrMode = 1 [append append prog " -l " lang append append prog " --oem " ocrMode]
ocrMode = 2 [append append prog " -l " lang]
ocrMode = 3 [append append prog " -l " lang append append prog " --oem " ocrMode]
]
call/wait prog
either cb/data [
clear tbuffer
clear result/data
tt: read tFileExt
tbuffer: split tt "^/"
nl: length? tbuffer 
i: 1
while [i <= nl][
ligne: tbuffer/:i
ll: length? ligne
if  ll > 1 [append result/data rejoin [ligne lf]]
i: i + 1
]
result/text: copy form result/data]
[result/text: read tFileExt]
]
]

; ***************** Test Program Interface ************************
view win: layout [
title "Tesseract OCR with Red"
button  "Load Image" [loadImage]
text 60 "Language"
dp1: drop-down 180 data tessdata
select 24
on-change [ 
s: dp1/data/(face/selected)
lang: first split s " "
]
text 80 "OCR mode" 
dp2: drop-down 230 data ocr
select 4
on-change [ocrMode: face/selected - 1]
cb: check "Lines" false
button "Process" [processFile]
button "Clear" [clear result/text]
button "Quit" [if exists? tFileExt [delete tFileExt] Quit]
return
canvas: base gsize img
result: area  gsize font [name: "Arial" size: 16 color: black] 
data []
return
f: field  512
text "Font"
drop-list 120
data  ["Arial" "Consolas" "Comic Sans MS" "Times" "Hannotate TC"]
react [result/font/name: pick face/data any [face/selected 1]]
select 1
fs: field 50 "16" 
react [result/font/size: fs/data]
button 30 "+"  [fs/data: fs/data + 1]
button 30 "-"  [fs/data: max 1 fs/data - 1]
drop-list 100
data  ["black" "blue" "green" "yellow" "red"]
react [result/font/color: reduce to-word pick face/data any [face/selected 1]]
select 1
do [f/text: copy form appDir]
]


Result

This is an example for simplified chinese document.