I’m testing OCR services for invoice scanning/ocr and would like to know what would be best practices for pattern lookup.
Tesseract has option to add patterns before scanning.
I’m interested in patterns like date, VAT #, VAT percentage(s), IBAN #. My solution is to regex scanned text and the results are satisfiable but far from being correct 100% and that I wouldn’t automate.
Would it be possible to define that patterns before the scan/ocr and to have that info grouped in json result?
My use of your service would be as PRO subscriber.