Weird result from image - where is text coming from?

I process this simple seal and it comes up with some marketing text that doesn’t appear in any part of the image or in the original PDF. Is it possible there is some hidden text somewhere in it?

This extracted text:

****** Result for Image/Page 1 ******
Calgary, Alberta, Canada is a great place to live but it’s not always easy to find the right home. That’s where we come in! We are a Calgary real estate agency that specializes in helping you find your dream home. Our team of experienced agents will work tirelessly to ensure that you find the perfect property for you and your family. Whether you’re looking for a new house, a vacation rental or just want to know more about the real estate market in Calgary, we are here to help!

I guess it’s a common hallucination issue with some models. I’ll see if I can crop those characters out in top left corner

Your guess is correct :check_box_with_check:. We plan to publish a more detailed blog post on LLM OCR hallucinations soon, but to summarize:

  • OCR Engine 1 and OCR Engine 2 are classic OCR engines, they do not have any hallucinations.

  • OCR Engine 3 is LLM-based. This brings superior OCR quality for tricky documents but also the side effect of hallucinations in some situations.