Hi, I´m currently testing the OCR with pictures of tickets but it doesn´t scan the products correctly, it scans other parts of the ticket, does anyone have advice on how to filter the products? thanks in advance!
Hi, can you please post a sample image for us to test?
I uploaded 3 tickets, if you need more I can send them, thanks!
I did a test with the first image with OCR Engine2, and the product text looks good to me - but the quantity is missing. Single digit OCR is tricky.
→ Just to clarify: Is this (the missing quantity) the issue you are seeing?
nope not exactly, i wanted to extract the data in a json format if possible to only show the products, the quantity and price but it shows everything contained in the ticket; also, some products appear as “empty”, is there something I can do to fix these two issues?
Our OCR API always returns all text in a document/image. If you need only specific data, you can then post-process the OCR result. You could, for example, use regular expressions for this, or feed it into one of the LLM like ChatGPT, Gemini or Mistral.
For this issue, can you please post an example (overlay) screenshot that shows the missing data? There might be a fix for this, once we see the issue.