OCR did not recognize some numbers

Hey guys

I’m using command “OCRExtractRelative” on image: BoletoBancario

The relative image is:


for some reason OCR is not picking all the numbers

Correct number is: 23791.11103 60000.000103 01000.222206 1 48622000000000

OCR number is: 23791. 1103 60000.000103 01000.222206 48622000000000

The second and last number one are missing, and i do not know why , anybody knows the reason?

I tested the document itself with the ocr api, and the numbers are detected OK.

To test this further, can you please send me the small image from the area inside the pink frame? You find this image in the screenshot tab as “_lastscreenshot.png”. That is the image that is send to the OCR API for processing.

Example from the demopdftest_with_ocr demo macro:


I’ve just run the script

Below the image with the command:

“_lastscreeenshot.png” has the correct numbers


But when it stores the string, the second and last number one dissapears

If you need some further details, please, let me know

“_lastscreeenshot.png” has the correct numbers

Good to know. This is the image before it is send for OCR processing.

This means the extraction of the OCR area with the pink box works correct, but the problem is with the OCR engine. => We will fix this with the next OCR updates.

Workaround: It seems the problem is because the image with the number is very small (low height, and almost no white space on the left and right side). So if you make the pink box (= area to OCR) larger, this should fix the OCR issues:


i’ve applied the workaround and it worked. Now i’'m able to get the correct numbers.