OCRExtractRelative selects wrong area

Michelp76 · September 6, 2021, 8:37am

Hello everyone, I’m pretty new to UI.Vision although I did my homework and searched the forum thoroughly already.

OCRExtractRelative always fails to scrap any values in our PDF
Here is the green/pink mask :

(The two bold values are the ones I’m interested in)
Here is the csv output :

2021-09-06 10_29_37-journaux-de-paie.csv - RPA CSV Editor

Nothing is scraped as you can see

So, obviously I tried to make the green box more or less big
I also tried to tweak the size of the pink box

But it always results in an empty output no matter what.
I also played around the variables !OCRScale and isTable activated, or !OCREngine set to 2, but still no luck

Here is the __lastscreenshot.png :

__lastscreenshot

Any recommendation ?
Thanks a lot in advance

Now that I think about it, is __lastscreenshot.png the area that OCRExtractRelative captures ?
Well it very much seems like it’s the wrong lower part portion of the page.
So the green box would be the culprit ?

Michelp76 · September 6, 2021, 9:17am

Also tried narrowing down both boxes like so :

(Also switched the OCR language to “French”)

Here is the __lastscreenshot.png :

__lastscreenshot (1)

I’m pretty sure the spot it tries to capture is way below the target (pink box)
I wonder what’s wrong

Michelp76 · September 6, 2021, 1:22pm

It could be helpful to add that my PDF is actually full text, so if I understand this right :

I might not need OCR commands, as my PDF is already searchable.

That being said I have no clue how to make a search inside my PDF.
Something like storeText with an xpath say, “xpath=(//*[text()[contains(.,'My Keyword… ')]])” ?

I’ll try that

EDIT : doesn’t seem to work but I forgot to mention that I’m using an embedded Acrobat Reader inside an iframe :

That don’t seem to work well with Web scraping obviously.
So I guess OCR is the only way to go…

Provided it works !

ulrich · September 6, 2021, 2:56pm

In this case I would recommend you use XTYPE and simulate the “copy & paste” commands with CTRL+A (select all text in PDF) and then CTRL+C. Then, inside the macro, the data is in the !clipboard variable.

You can also try to use a triple-click to just select the line you need:

XCLICK | | #tripleclick
XType | ${!KEY_CTRL+KEY_C}

As mentioned below, you might need to switch to desktop automation, if the coordinates are wrong in the web automation mode.

PS: storeText works only on websites, not PDF.

PPS: This answer is a solution without OCR. The next answer is an idea to fix the OCR capture. This way you have 2 suggestions to try

ulrich · September 6, 2021, 2:59pm

Now that I think about it, is __lastscreenshot.png the area that OCRExtractRelative captures ?

Yes!

…(then) I’m pretty sure the spot it tries to capture is way below the target (pink box)

I agree. I assume the iframe and the embedded PDF confuse the BROWSER coordinates calculation of UI Vision.

=> For OCR to find the correct area, switch to desktop automation mode, then it will work! You can either switch in the UI Vision IDE or use this macro command: XDesktopAutomation | true

Michelp76 · September 7, 2021, 2:13pm

In this case I would recommend you use XTYPE and simulate the “copy & paste” commands with CTRL+A (select all text in PDF) and then CTRL+C. Then, inside the macro, the data is in the !clipboard variable.

Thanks, that did the trick !
Now I’m on my way to parse the !clipboard content