We are playing around with your PDF/OCR service to evaluate how well it fits into our setup.
Until now we are happy about what we are seeing and we believe that the Pro PDF license will serve us well.
However, in our current software we convert PDF-files into jpg-files and display the jpg to the user - we are then generating dynamic overlays from OCR depending on the user-scenario or mouse-over interaction.
We successfully managed to use the output from ocr.space to do this - but we struggle a bit with aligning the scale of the image with the coordinates in the JSON file.
The challenge is to understand the page height/width, resolution, and unit(not pixels since it’s often .5) of your internal page - upon which you refer to by coordinates in the word-segment. We cannot simply iterate over the word segment to get the page height/width as most pages have margins.
It would help us greatly if some of this information could be returned in the JSON file - especially the page height/width of the individual page.