OCR API - Only take first page pdf

I really like the ocr space api and currently doing a lot of trial and error runs on getting invoice details and making my system learn the local variations. Sometimes I have a multipage pdf invoice but only want to “read” the first page as that is the only relevant one I need and I also noticed that the api really hangs on doing multipage. I looked at the parameters but dont see any option I can make to do '“first page only” Of course I could preprocess the pdf and split it and then feed it to the api but as there is already a limit of 3 pages I suspect there is a build in function in the api that could also make that limit at 1 or 2 if needed

This is a good feature suggestion. We do not have this feature yet, but it is on our “todo” list.

Technically this is not difficult, but we have just not gotten around yet to expose our internal page limit feature as ocr api parameter. So at the moment, pre-processing the PDF is the best option. There are several PDF splitting command line tools e. g “pdfseparate”.

1 Like