How to read pdf drug lines help

Sjj · July 4, 2024, 8:41pm

I need to read the drugs on a prescription. Sample pic is part of a pdf
The number of drugs can vary , along with information. I need to read each section as 1 drug line . Been unable to use the hyphons as seperator, which means icannot seperate the drugs… using ocrspace pro engine 2

ocr-api-team · July 8, 2024, 8:56am

That is an interesting question. Yes, the “-------” are not picked up as characters. The solution here would be to look at the bounding box that is returned with each word:

For example this:

    {
            "LineText": "Bisoprolol 2.5mg tablets (56 tablet)",
            "Words": [
              {
                "WordText": "Bisoprolol",
                "Left": 182,
                "Top": 192, <---- this one
                "Height": 24,
                "Width": 90
              },

→ Now you have to look that the distance (pixel difference) (= TOP value of bounding boxes) between the difference “LineText”. If the difference is larger than a certain threshold X, then you know a drug region has started. This way you can separate the text.