OCR API - Only take first page pdf

Edynas · April 25, 2023, 7:41am

I really like the ocr space api and currently doing a lot of trial and error runs on getting invoice details and making my system learn the local variations. Sometimes I have a multipage pdf invoice but only want to “read” the first page as that is the only relevant one I need and I also noticed that the api really hangs on doing multipage. I looked at the parameters but dont see any option I can make to do '“first page only” Of course I could preprocess the pdf and split it and then feed it to the api but as there is already a limit of 3 pages I suspect there is a build in function in the api that could also make that limit at 1 or 2 if needed

ocr-api-team · April 25, 2023, 12:25pm

This is a good feature suggestion. We do not have this feature yet, but it is on our “todo” list.

Technically this is not difficult, but we have just not gotten around yet to expose our internal page limit feature as ocr api parameter. So at the moment, pre-processing the PDF is the best option. There are several PDF splitting command line tools e. g “pdfseparate”.

Sammer · April 25, 2025, 11:16am

A common OCR scenario is to identify and classify multipage docs taking on account only its first page. Being able to limit ocr to the first page would give a faster and smaller result and would save server resources. Please add it !