POST pdf document

Hi All,

I’m struggling to understand what is meant by the following;

“file : Multipart encoded image file with filename”

I am trying to post a PDF file but I am returned a 411 - “Content-Length” required error. I am unsure what is meant by a encoded image file, I am just trying to POST a PDF file. Also, I am unsure what is meant by “with” filename. Should I be sending the PDF document in the body of the HTTP POST request or is it a value in the “file” header request.

Please advise and all your help is much appreciated.

Thanks,
Zabir.

I recommend that you test it with Postman first:

https://ocr.space/ocrapi#postman

See here: https://ocr.space/Content/Images/postman-pdf-ocr.png

Thanks for the swift response ulrich, this worked in postman, however, I wonder is postman is in fact encoding the document before posting it? Would you know?

Postman has a code generation feature. There you can convert the (working) Postman call into source code. Does that work for you?

Hi ulrich, thank you so much for your help. This did in fact work somewhat, though not completely. I had to build the entire body for the POST request (it’s basically just a raw string which looks like this;

image

This is my result;

{“ParsedResults”:[{“TextOrientation”:“0”,“FileParseExitCode”:1,“ParsedText”:"",“ErrorMessage”:"",“ErrorDetails”:""}],“OCRExitCode”:1,“IsErroredOnProcessing”:false,“ProcessingTimeInMilliseconds”:“437”,“SearchablePDFURL”:“Searchable PDF not generated as it was not requested.”}

I’m wondering whether or not the API expects a JSON formatted object (containing the PDF) in the actual body of the POST request.

Thanks,
Z

Ok so I’ve tried a few different things. I think everything is generally ok, however, when I POST a base64 encoded PDF, I get a response of “PDF Corrupted”.When I sent the raw PDF contents, the ParsedText in the JSON is empty. I think I would need someone on the OCR.Space side to check what they’re receiving, may be something is being lost during the HTTP communication, but I can’t see that from my side.

Post your code you’re using for the POST call, that’s where your issue is

There you mate;

I’m building the payload using javascript and sending it as the POST request body.

Ok so I found the solution, it is actually very simple and many people have experienced a similar issue to no avail, and I can see why; it would help if the OCR api docs were updated to make it clearer because what stumped me was that a different method for posting PDF document has been documented, but in fact it’s exactly the same as posting a regular base64 image. Any way, so the solution is as follows;

  1. Instead of trying to upload an actual file (or blob, which is the file content), POST a “Data URI” instead. Note: it’s not a URL as you’ll see below;

  2. Convert the file into base64 string.

  3. Append the following to the front of the base64 string;

“data:application/pdf;base64,”

so the base64 would look something like this;

“data:application/pdf;base64,JVBERi0xLjMKNiAwIG9iago8PC9M=”

  1. Use this website do generate the base64 Data URI for you just for testing - https://base64.guru/converter/encode/pdf;

NOTE: Upload the file and select the correct output format - Data URI.

  1. Add this to the form body with a key, in this case it would be “base64image”, note this isn’t literally an image as in a .gif, .png or .jpg, it actually means the base64string. See screenshot below;

  1. Once you’ve got it to work in Postman, try the same within your application.

NOTE: for noob, don’t forget to add your apikey into the header like so;

Hope this helps,

Thanks,
Zabir.

1 Like

Thank you very much for this text will be a hidden thread. I am trying to solve this problem of PDF document from the last week. I have got my answer in this thread. This thread is full of knowledge and information. So thank you very much.