OCR Layout PDF vs Img

Hi There,
My software extracts the text from text-searchable PDFs.
It honours the layout of the text, to closely match the physical layout of the text position on the page.

I have been testing using the C# sample, with a few parameters changed.

  • When i extract the text from a PDF, the layout is spaced out by lots of spaces.
  • When i extract the text from an image (png screenshot of the PDF), it spaces the text out using TABS and Spaces
    I am usging engine 1, English, scale, istable

Why would I be seeing 2 different results?

My first guess would be that your PNG from PDF is different from the PNG what we generate internally on the OCR API server.

Technically every uploaded PDF is converted to into a series of screenshots (one for every page).

Try to experiment with scale=false and scale=true - does this make a difference?