Currency symbols not recognized


#1

Hi,

Seems the ocr api is not recognizing currency symbols on this image. Can you explain why? I tried teserract and that seems to do a good job of it. I have purchased the pro plan but without this might become unsuable for me.

thanks


cheers
Kiran


#2

I confirmed the problem. We are working on it.

Out of curiosity, what version of Tesseract did you use?


#3

Version 4 with LTSM… do you have any indication on when this issue may get fixed?


#4

It is difficult to estimate the ETA for OCR improvements. But I found a solution that works right away: Use ChineseSimplified (language=chs) as OCR language. This recognizes English as well and includes the Pound/Euro currency symbols.

My OCR.space test with Chinese as OCR language:

By comparison, here are my test results for Tesseract 4.0 LTSM:

Original image: The processing time for your original image was more than a minute(!), so I made the further tests just with the circle part.

Unmodified image (except cropped):

Reduced size by 50%

Inverted color:


#5

Interesting… i can definitely try that…do you also suggest that before submitting to the api endpoint i also perform things like grayscaling, noise reduction etc. to improve the accuracy? Or is this something that is done on the api automatically?

thanks for the quick response really appreciate it!! if all goes well then i am hoping to use this atleast for a 100,000 requests a month :slight_smile:


#6

do you also suggest that before submitting to the api endpoint i also perform things like grayscaling, noise reduction etc. to improve the accuracy?

This is not needed. I only did it for Tesseract, because there it is needed.

Or is this something that is done on the api automatically?

Yes!


#7

Hi,

Ok cool… do you have any tech that is able to remove the background image from an image with text and background image?

thanks
cheers
Kiran


#8

Usually our OCR API works fine even if their is a background image.


#9

Hi,

So been trying out ‘Chinese Simplified’ as recommended and really struggling. Take a look at the attached example. It just completely mixes up the english character. Any clue what i can do here.77777-1555549601664


#10

But on this image, there are no currency symbols? In this case English is the best OCR language to use.


#11

Sorry wrong image… attached for eg… its in english and has currencies… when i try Chinese simplified the text is all over the place.