Multi page reading for PDF files in ocr.space

mask · February 28, 2023, 7:27pm

Hi, I am using ocr.space for one my undergrad course project work where I need to scan multi page pdf, maximum 3. I put this as payload,


def ocr_space_file(filename, overlay=False, api_key='my_key', language='ger'): 
      payload = {'isOverlayRequired': overlay,
                       'detectOrientation': True, 
                       'isTable': True, 
                       'scale': True, 
                       'apikey': api_key, 
                       'language': language, } 

with open(filename, 'rb') as f: 
     r = [requests.post](http://requests.post/)
         ('https://api.ocr.space/parse/image', 
          files={filename: f},
          data=payload, ) 
     return r.content.decode()

and saving the response from the ocrspace as r.content.decode() But I am just getting first page scan report instead of all 3. So what I am doing wrong here?

ocr-api-team · March 1, 2023, 9:43am

Hello, actually you did nothing wrong. It is simply that the Free OCR API plan has a 3 page PDF limit, so only the first 3 pages of any PDF are processed.

Thanks for using our PDF OCR software for your work → here is a little workaround that I can share:

For now, our new OCR Engine5 does not have this page limit check yet

It will be added eventually, but this will take some time (other work has priority). So as solution for your project, you can just switch to Engine5 to remove the page limit. This includes the option to create searchable PDF.

‘detectOrientation’: True,

Hint: I would use this flag only if you really need it. It increases the PDF OCR time significantly (3-4 times longer).