lundi 25 mars 2019

redCV and image denoising

RedCV can be used for image denoising. A lot of functions are included for helping image restoration. Basically a 3x3 kernel is used to calculate the pixel value of neighbors and to replace the pixel value in the image by the result. Of course, kernel size can be changed. According to the noise included in image you can use different parametric filters. 
When noise is simple such as pepper and salt noise a simple median filter will be efficient: Central pixel value is replaced by the median value of neighbors by rcvMedianFilter function.

But when the image is really noisy, median filter is not sufficiently efficient:
In this case *rcvMinFilter* function can be used: Central pixel value is replaced by the minimum value of neighbors and the result is pretty good. 
You can also use *rcvMaxFilter* (max value of neighbors) or *rcvMidPointFilter* (central pixel value is replaced by minimum+ maximum values of neighbors divided by 2 ) according to the noise contained in the image.
RedCV also includes a *rcvMeanFilter* function, which can be used for image smoothing. 

Code sample

As usual with Red and redCV, code is very clear and simple.
Red [
    Title: "Smoothing filters for image "
    Author: "Francois Jouen"
    File:    %smoothing.red
    Needs:   'View
]
#include %../../libs/redcv.red ; for redCV functions
kSize: 3x3
src: make image! 512x512
dst: make image! 512x512
isFile: false

loadImage: does [
    isFile: false
    canvas2/image: none
    tmp: request-file
    if not none? tmp [
        src: load tmp   
        dst: make image! src/size
        canvas1/image: src
        isFile: true
    ]
]
view win: layout [
    title "Red Smoothing Filters"
    origin 10x10 space 10x10
    button "Load"               [loadImage]
    text "Filter Size"
    field "3x3"                 [if error? try [kSize: to-pair face/text] [kSize: 3x3]]
    button "Median"             [if isFile [rcvMedianFilter src dst kSize canvas2/image: dst]]
    button "Min"                [if isFile [rcvMinFilter src dst kSize canvas2/image: dst]]
    button "Max"                [if isFile [rcvMaxFilter src dst kSize canvas2/image: dst]]
    button "MidPoint"           [if isFile [rcvMidPointFilter src dst kSize canvas2/image: dst]]
    button "Arithmetic Mean"    [if isFile [rcvMeanFilter src dst kSize 0 canvas2/image: dst]]
    button "Harmonic Mean"      [if isFile [rcvMeanFilter src dst kSize 1 canvas2/image: dst]]
    button "Geometric Mean"     [if isFile [rcvMeanFilter src dst kSize 2 canvas2/image: dst]]
    button "Quit"               [quit]
    return
    canvas1: base 512x512 white
    canvas2: base 512x512 white
]



jeudi 21 mars 2019

Deep Learning Text Recognition (OCR) using Tesseract and Red

I'm frequently using OCR Tesseract when I have to recognize text in images.
Tesseract was initially developed  by Hewlett Packard Labs. In 2005, it was open sourced by HP, and since 2006 it has been actively developed by Google and open source community.

In version 4, Tesseract has implemented a long short term memory (LSTM) recognition engine which is a kind of recurrent neural network (RNN) very efficient for OCR.

Tesseract library includes a command line tool tesseract which can be used  to perform OCR on images and output the result in a text file.

Install Tesseract

First of all you need to install Tesseract library accoding to your main OS such as sudo apt install tesseract-ocr for Linux or  brew install tesseract for macOS.

If you want multi-language support (about 170 languages) you have to install *tesseract-lang* package.


Using Tesseract with Red Language

This operation is really trivial since Red includes a call fonction which makes possible to use tesseract command line tool. You have to use call/wait refinment in order to wait tessearact execution. You can use different languages according to your documents.

Code Sample

Red [
Title:   "OCR"
Author:  "Francois Jouen"
File:  %tesseract.red
Needs:  View
icon:  %red.ico
]
; Languages
`tessdata: [
"afr (Afrikaans)"
"amh (Amharic)"
"ara (Arabic)"
"asm (Assamese)"
"aze (Azerbaijani)"
"aze_cyrl (Azerbaijani - Cyrilic)"
"bel (Belarusian)"
"ben (Bengali)"
"bod (Tibetan)"
"bos (Bosnian)"
"bre (Breton)"
"bul (Bulgarian)"
"cat (Catalan; Valencian)"
"ceb (Cebuano)"
"ces (Czech)"
"chi_sim (Chinese - Simplified)"
"chi_tra (Chinese - Traditional)"
"chr (Cherokee)"
"cym (Welsh)"
"dan (Danish)"
"deu (German)"
"dzo (Dzongkha)"
"ell (Greek Modern (1453-)"
"eng (English)"
"enm (English Middle (1100-1500)"
"epo (Esperanto)"
"equ (Math / equation detection module)"
"est (Estonian)"
"eus (Basque)"
"fas (Persian)"
"fin (Finnish)"
"fra (French)"
"frk (Frankish)"
"frm (French Middle (ca.1400-1600)"
"gle (Irish)"
"glg (Galician)"
"grc (Greek Ancient (to 1453)"
"guj (Gujarati)"
"hat (Haitian; Haitian Creole)"
"heb (Hebrew)"
"hin (Hindi)"
"hrv (Croatian)"
"hun (Hungarian)"
"iku (Inuktitut)"
"ind (Indonesian)"
"isl (Icelandic)"
"ita (Italian)"
"ita_old (Italian - Old)"
"jav (Javanese)"
"jpn (Japanese)"
"kan (Kannada)"
"kat (Georgian)"
"kat_old (Georgian - Old)"
"kaz (Kazakh)"
"khm (Central Khmer)"
"kir (Kirghiz; Kyrgyz)"
"kor (Korean)"
"kor_vert (Korean (vertical)"
"kur (Kurdish)"
"kur_ara (Kurdish (Arabic)"
"lao (Lao)"
"lat (Latin)"
"lav (Latvian)"
"lit (Lithuanian)"
"ltz (Luxembourgish)"
"mal (Malayalam)"
"mar (Marathi)"
"mkd (Macedonian)"
"mlt (Maltese)"
"mon (Mongolian)"
"mri (Maori)"
"msa (Malay)"
"mya (Burmese)"
"nep (Nepali)"
"nld (Dutch; Flemish)"
"nor (Norwegian)"
"oci (Occitan (post 1500)"
"ori (Oriya)"
"osd (Orientation and script detection module)"
"pan (Panjabi; Punjabi)"
"pol (Polish)"
"por (Portuguese)"
"pus (Pushto; Pashto)"
"que (Quechua)"
"ron (Romanian; Moldavian; Moldovan)"
"rus (Russian)"
"san (Sanskrit)"
"sin (Sinhala; Sinhalese)"
"slk (Slovak)"
"slv (Slovenian)"
"snd (Sindhi)"
"spa (Spanish; Castilian)"
"spa_old (Spanish; Castilian - Old)"
"sqi (Albanian)"
"srp (Serbian)"
"srp_latn (Serbian - Latin)"
"sun (Sundanese)"
"swa (Swahili)"
"swe (Swedish)"
"syr (Syriac)"
"tam (Tamil)"
"tat (Tatar)"
"tel (Telugu)"
"tgk (Tajik)"
"tgl (Tagalog)"
"tha (Thai)"
"tir (Tigrinya)"
"ton (Tonga)"
"tur (Turkish)"
"uig (Uighur; Uyghur)"
"ukr (Ukrainian)"
"urd (Urdu)"
"uzb (Uzbek)"
"uzb_cyrl (Uzbek - Cyrilic)"
"vie (Vietnamese)"
"yid (Yiddish)"
"yor (Yoruba)"
]
;OCR Engine Mode
ocr: [
"Original Tesseract only"
"Neural nets LSTM only"
"Tesseract + LSTM"
"Default, based on what is available"
]



appDir: "Please adapt or use what-dir"
appDir: what-dir
tFile: to-file rejoin[appDir "tempo"]
tFileExt: to-file rejoin[appDir "tempo.txt"]
change-dir to-file appDir

dSize: 512
gsize: as-pair dSize dSize
img: make image! reduce [gSize black]
lang: "eng"
ocrMode: 3
tmpf: none
tBuffer: copy []

loadImage: does [
tmpf: request-file
isFile: false
if not none? tmpf [
clear result/text
img: load tmpf
canvas/image: img
isFile: true
]
]

processFile: does [
if isFile [
if exists? tFileExt [delete tFileExt]
clear result/text 
prog: copy "/usr/local/bin/tesseract " 
append prog form tmpf 
append append prog " " form tFile
case [
ocrMode = 0 [append append prog " -l " lang]
ocrMode = 1 [append append prog " -l " lang append append prog " --oem " ocrMode]
ocrMode = 2 [append append prog " -l " lang]
ocrMode = 3 [append append prog " -l " lang append append prog " --oem " ocrMode]
]
call/wait prog
either cb/data [
clear tbuffer
clear result/data
tt: read tFileExt
tbuffer: split tt "^/"
nl: length? tbuffer 
i: 1
while [i <= nl][
ligne: tbuffer/:i
ll: length? ligne
if  ll > 1 [append result/data rejoin [ligne lf]]
i: i + 1
]
result/text: copy form result/data]
[result/text: read tFileExt]
]
]

; ***************** Test Program Interface ************************
view win: layout [
title "Tesseract OCR with Red"
button  "Load Image" [loadImage]
text 60 "Language"
dp1: drop-down 180 data tessdata
select 24
on-change [ 
s: dp1/data/(face/selected)
lang: first split s " "
]
text 80 "OCR mode" 
dp2: drop-down 230 data ocr
select 4
on-change [ocrMode: face/selected - 1]
cb: check "Lines" false
button "Process" [processFile]
button "Clear" [clear result/text]
button "Quit" [if exists? tFileExt [delete tFileExt] Quit]
return
canvas: base gsize img
result: area  gsize font [name: "Arial" size: 16 color: black] 
data []
return
f: field  512
text "Font"
drop-list 120
data  ["Arial" "Consolas" "Comic Sans MS" "Times" "Hannotate TC"]
react [result/font/name: pick face/data any [face/selected 1]]
select 1
fs: field 50 "16" 
react [result/font/size: fs/data]
button 30 "+"  [fs/data: fs/data + 1]
button 30 "-"  [fs/data: max 1 fs/data - 1]
drop-list 100
data  ["black" "blue" "green" "yellow" "red"]
react [result/font/color: reduce to-word pick face/data any [face/selected 1]]
select 1
do [f/text: copy form appDir]
]


Result

This is an example for simplified chinese document.