I need to read the drugs on a prescription. Sample pic is part of a pdf
The number of drugs can vary , along with information. I need to read each section as 1 drug line . Been unable to use the hyphons as seperator, which means icannot seperate the drugs… using ocrspace pro engine 2
1 Like
That is an interesting question. Yes, the “-------” are not picked up as characters. The solution here would be to look at the bounding box that is returned with each word:
For example this:
{
"LineText": "Bisoprolol 2.5mg tablets (56 tablet)",
"Words": [
{
"WordText": "Bisoprolol",
"Left": 182,
"Top": 192, <---- this one
"Height": 24,
"Width": 90
},
→ Now you have to look that the distance (pixel difference) (= TOP value of bounding boxes) between the difference “LineText”. If the difference is larger than a certain threshold X, then you know a drug region has started. This way you can separate the text.