POST pdf document

zabirhakim · February 20, 2020, 8:50pm

Hi All,

I’m struggling to understand what is meant by the following;

“file : Multipart encoded image file with filename”

I am trying to post a PDF file but I am returned a 411 - “Content-Length” required error. I am unsure what is meant by a encoded image file, I am just trying to POST a PDF file. Also, I am unsure what is meant by “with” filename. Should I be sending the PDF document in the body of the HTTP POST request or is it a value in the “file” header request.

Please advise and all your help is much appreciated.

Thanks,
Zabir.

ulrich · February 20, 2020, 10:00pm

I recommend that you test it with Postman first:

See here: https://ocr.space/Content/Images/postman-pdf-ocr.png

zabirhakim · February 20, 2020, 10:25pm

Thanks for the swift response ulrich, this worked in postman, however, I wonder is postman is in fact encoding the document before posting it? Would you know?

ulrich · February 20, 2020, 10:38pm

Postman has a code generation feature. There you can convert the (working) Postman call into source code. Does that work for you?

zabirhakim · February 21, 2020, 10:39am

Hi ulrich, thank you so much for your help. This did in fact work somewhat, though not completely. I had to build the entire body for the POST request (it’s basically just a raw string which looks like this;

This is my result;

{“ParsedResults”:[{“TextOrientation”:“0”,“FileParseExitCode”:1,“ParsedText”:"",“ErrorMessage”:"",“ErrorDetails”:""}],“OCRExitCode”:1,“IsErroredOnProcessing”:false,“ProcessingTimeInMilliseconds”:“437”,“SearchablePDFURL”:“Searchable PDF not generated as it was not requested.”}

I’m wondering whether or not the API expects a JSON formatted object (containing the PDF) in the actual body of the POST request.

Thanks,
Z

zabirhakim · February 21, 2020, 4:06pm

Ok so I’ve tried a few different things. I think everything is generally ok, however, when I POST a base64 encoded PDF, I get a response of “PDF Corrupted”.When I sent the raw PDF contents, the ParsedText in the JSON is empty. I think I would need someone on the OCR.Space side to check what they’re receiving, may be something is being lost during the HTTP communication, but I can’t see that from my side.

User9898 · February 21, 2020, 4:30pm

Post your code you’re using for the POST call, that’s where your issue is

zabirhakim · February 21, 2020, 4:59pm

There you mate;

I’m building the payload using javascript and sending it as the POST request body.

zabirhakim · February 28, 2020, 6:10pm

Ok so I found the solution, it is actually very simple and many people have experienced a similar issue to no avail, and I can see why; it would help if the OCR api docs were updated to make it clearer because what stumped me was that a different method for posting PDF document has been documented, but in fact it’s exactly the same as posting a regular base64 image. Any way, so the solution is as follows;

Instead of trying to upload an actual file (or blob, which is the file content), POST a “Data URI” instead. Note: it’s not a URL as you’ll see below;
Convert the file into base64 string.
Append the following to the front of the base64 string;

“data:application/pdf;base64,”

so the base64 would look something like this;

“data:application/pdf;base64,JVBERi0xLjMKNiAwIG9iago8PC9M=”

Use this website do generate the base64 Data URI for you just for testing - PDF to Base64 | Base64 Encode | Base64 Converter | Base64;

NOTE: Upload the file and select the correct output format - Data URI.

Add this to the form body with a key, in this case it would be “base64image”, note this isn’t literally an image as in a .gif, .png or .jpg, it actually means the base64string. See screenshot below;

Once you’ve got it to work in Postman, try the same within your application.

NOTE: for noob, don’t forget to add your apikey into the header like so;

Hope this helps,

Thanks,
Zabir.