i input pdfs into the free ocr api and want to extract several values.
Sadly from the pdf layout the part for the E-Mail can be in 1 Column or if the email gets longer it can be in 2 columns.
If it is in 1 column everything works allright since my Regex can filter it perfectly.
When it is split in 2 columns regex cant identify it anymore since it is split and the first part of the mail adress is very eary in the code and does not come after E-Mail: (2nd part).
Is there a way to help this? I tried the different engines but that didnt work.
I tried to alter the Regex but cannot do it better.