Get the image used for Ocr.Space?

Is there a way to receive or download the generated image that the Ocr.Space processing was based on when lifting a PDF document?

Either via a link in the response to download or included as base64string in the response?

This is currently not yet possible. But can I ask for what you would need this?

Of course. We are currently building a system that automatically recognize different types/brands of invoices. To recognize an invoice curtain texts have to be at curtain locations in the document. Lets call these Document Scanner Definitions(DSD).

We are building these DSDs for each different type of invoice by hand. They have to be. This cannot be automated. Need human eyes and knowledge of what to look for to recognize document structure. But once an invoice type has a DSD then all invoices that are recognized we can then be pretty sure that we can extract values from the document that is important to our business automagically. Cost, Tax, Total Cost incl Tax, CVR values. Something like that. Values that are used in ERP systems.

It’s in this manual process that we would like to be able to retriever the source of the processing. To make it really user-friendly we are currently adding a text overlay based on your calculations but because we do not have the original source image our display might be a little off and we have to calculate a zoom scale value (for reference its ‘renderedImage.Width / 803’). Because we do not know if you will keep using an image width of 803 our calculations may at some point be of until we update to your new image width.

So if we were to retrieve the original source image we’d be able to create a more future proof product for our employéers. This is a 100% internal tool and once a DSD has been created never again will the source image have to be downloaded. Then the system will have learned that type and user interaction is no longer needed.

Hope this makes sense.

Thanks a for the details! This makes very good sense.

we have to calculate a zoom scale value

The good news is that we apply exactly the same zoom factor all the time :slight_smile:

=> So once you have your correction factor, it can be used for all PDF documents that you send to our OCR API.

See also PDF OCR JSON coordinates

I think it would make more sense to provide percentage based coordinates.
So instead of left = 31 and top =1 123 it could be left = 4.32% (of the width of the doc) and so forth.
Because we don’t know if every image of a given doc will be captured as the same resolution, overall if it’s done from a mobile App