Single Character Recognition, Labels added to data, LineText. vs. WordText

ocr-api-team · May 26, 2023, 9:20am

Thanks, I looked at the PDF.:

there are places where the label (e.g. “Shipper”) and the actual data (The shipper name) are combined into a single piece of text. I

when I test with your PDF this works ok:

LineText": “Shipper/Expéditeur: GRAINGER 003 INT”,

(If it fails, you could search for “:” and split the string there?)

misses a single character in a field (i.e. “Packaging” field has a single “1” in it, that it often misses

Single digit number OCR is indeed a challenge that fails sometimes.

And finally, what’s the difference between LineText and WordText in the JSON files I’m receiving back,?

For Engine2 both are the same currently. In future updates LineText will contain the complete sentence (as now), but WordText the single words and their bounding box (as it is currently already for Engine1).

LineText": “Shipper/Expéditeur: GRAINGER 003 INT”,