Parsing a Sudoku puzzle

I’ve tried to use the parser on what I thought was an extremely easy task, but it failed to parse.

It’s an image file of a Sudoku game grid. There are 81 squares, some of them have numbers in them. The only thing to parse are the numbers. I am guessing that since there were no words, the parser gave up.

I’m brand new, so there’s a great chance that I’m doing something wrong. But, it seems like the parser should be able to handle something this straightforward.

This is the parser from the OCR.space site.

What I’m ultimately trying to do is use an API to handle image files such as this one.

Thoughts?

This is a very interesting test case. As you guessed correctly, single digit numbers are difficult to OCR. Indeed, our current engine does not detect any of them.

The good news is that our upcoming new OCR engine2 detects all of them:

To get this result, I reduced the image size by 50%. This makes the number font size more normal/standard. With the original size, it missed two numbers.

The new engine will be available for public beta either this week or next.
The new engine is available for public beta :grinning:

Thanks. Can you post the JSON file that was produced in this case using the OCR engine2. And how can I sign up for the public beta?
Thanks again.

The format for OCR engine2 will be exactly the same as the current one produced by engine1.

Yes but engine1 can’t produce the file - that’s why I’d like to see the file produced by engine2. If I had engine2 I could produce it myself, but engine2 is not available yet - that’s the point!

For the public beta, we will post an update on our OCR blog early next week.

Here is the API response:

{
  "ParsedResults": [
    {
      "Overlay": {
        "Lines": [
          {
            "LineText": "1 2 4",
            "Words": [
              {
                "WordText": "1",
                "Left": 30,
                "Top": 8,
                "Height": 38,
                "Width": 49
              },
              {
                "WordText": "2",
                "Left": 91,
                "Top": 8,
                "Height": 38,
                "Width": 27
              },
              {
                "WordText": "4",
                "Left": 125,
                "Top": 8,
                "Height": 39,
                "Width": 25
              }
            ],
            "MaxHeight": 39,
            "MinTop": 8
          },
          {
            "LineText": "5",
            "Words": [
              {
                "WordText": "5",
                "Left": 10,
                "Top": 50,
                "Height": 32,
                "Width": 29
              }
            ],
            "MaxHeight": 32,
            "MinTop": 50
          },
          {
            "LineText": "9",
            "Words": [
              {
                "WordText": "9",
                "Left": 7,
                "Top": 91,
                "Height": 30,
                "Width": 30
              }
            ],
            "MaxHeight": 30,
            "MinTop": 91
          },
          {
            "LineText": "2",
            "Words": [
              {
                "WordText": "2",
                "Left": 6,
                "Top": 131,
                "Height": 31,
                "Width": 28
              }
            ],
            "MaxHeight": 31,
            "MinTop": 131
          },
          {
            "LineText": "8",
            "Words": [
              {
                "WordText": "8",
                "Left": 163,
                "Top": 51,
                "Height": 30,
                "Width": 28
              }
            ],
            "MaxHeight": 30,
            "MinTop": 51
          },
          {
            "LineText": "2 3",
            "Words": [
              {
                "WordText": "2",
                "Left": 104,
                "Top": 92,
                "Height": 35,
                "Width": 50
              },
              {
                "WordText": "3",
                "Left": 206,
                "Top": 92,
                "Height": 35,
                "Width": 24
              }
            ],
            "MaxHeight": 35,
            "MinTop": 92
          },
          {
            "LineText": "2 7",
            "Words": [
              {
                "WordText": "2",
                "Left": 240,
                "Top": 52,
                "Height": 31,
                "Width": 29
              },
              {
                "WordText": "7",
                "Left": 279,
                "Top": 53,
                "Height": 31,
                "Width": 27
              }
            ],
            "MaxHeight": 32,
            "MinTop": 52
          },
          {
            "LineText": "8 3 6",
            "Words": [
              {
                "WordText": "8",
                "Left": 81,
                "Top": 133,
                "Height": 33,
                "Width": 31
              },
              {
                "WordText": "3",
                "Left": 163,
                "Top": 131,
                "Height": 31,
                "Width": 28
              },
              {
                "WordText": "6",
                "Left": 201,
                "Top": 132,
                "Height": 31,
                "Width": 27
              }
            ],
            "MaxHeight": 37,
            "MinTop": 131
          },
          {
            "LineText": "3 1 2",
            "Words": [
              {
                "WordText": "3",
                "Left": 37,
                "Top": 171,
                "Height": 32,
                "Width": 36
              },
              {
                "WordText": "1",
                "Left": 123,
                "Top": 172,
                "Height": 32,
                "Width": 25
              },
              {
                "WordText": "2",
                "Left": 166,
                "Top": 173,
                "Height": 32,
                "Width": 28
              }
            ],
            "MaxHeight": 34,
            "MinTop": 171
          },
          {
            "LineText": "8",
            "Words": [
              {
                "WordText": "8",
                "Left": 239,
                "Top": 174,
                "Height": 32,
                "Width": 29
              }
            ],
            "MaxHeight": 32,
            "MinTop": 174
          },
          {
            "LineText": "1 8 53",
            "Words": [
              {
                "WordText": "1",
                "Left": 99,
                "Top": 213,
                "Height": 24,
                "Width": 8
              },
              {
                "WordText": "8",
                "Left": 129,
                "Top": 208,
                "Height": 35,
                "Width": 24
              },
              {
                "WordText": "53",
                "Left": 200,
                "Top": 205,
                "Height": 49,
                "Width": 73
              }
            ],
            "MaxHeight": 52,
            "MinTop": 204
          },
          {
            "LineText": "6 8 9 2",
            "Words": [
              {
                "WordText": "6",
                "Left": 156,
                "Top": 252,
                "Height": 30,
                "Width": 39
              },
              {
                "WordText": "8",
                "Left": 207,
                "Top": 252,
                "Height": 37,
                "Width": 26
              },
              {
                "WordText": "9",
                "Left": 273,
                "Top": 253,
                "Height": 33,
                "Width": 36
              },
              {
                "WordText": "2",
                "Left": 325,
                "Top": 253,
                "Height": 33,
                "Width": 17
              }
            ],
            "MaxHeight": 39,
            "MinTop": 251
          },
          {
            "LineText": "7",
            "Words": [
              {
                "WordText": "7",
                "Left": 56,
                "Top": 256,
                "Height": 23,
                "Width": 16
              }
            ],
            "MaxHeight": 23,
            "MinTop": 256
          },
          {
            "LineText": "9",
            "Words": [
              {
                "WordText": "9",
                "Left": 54,
                "Top": 297,
                "Height": 24,
                "Width": 16
              }
            ],
            "MaxHeight": 24,
            "MinTop": 297
          },
          {
            "LineText": "1 3",
            "Words": [
              {
                "WordText": "1",
                "Left": 240,
                "Top": 292,
                "Height": 34,
                "Width": 25
              },
              {
                "WordText": "3",
                "Left": 286,
                "Top": 292,
                "Height": 34,
                "Width": 23
              }
            ],
            "MaxHeight": 34,
            "MinTop": 292
          },
          {
            "LineText": "4",
            "Words": [
              {
                "WordText": "4",
                "Left": 239,
                "Top": 332,
                "Height": 35,
                "Width": 31
              }
            ],
            "MaxHeight": 35,
            "MinTop": 332
          }
        ],
        "HasOverlay": true,
        "Message": "Total lines: 16"
      },
      "FileParseExitCode": 1,
      "TextOrientation": "0",
      "ParsedText": "1 2 4\n5\n9\n2\n8\n2 3\n2 7\n8 3 6\n3 1 2\n8\n1 8 53\n6 8 9 2\n7\n9\n1 3\n4",
      "ErrorMessage": "",
      "ErrorDetails": ""
    }
  ],
  "OCRExitCode": 1,
  "IsErroredOnProcessing": false,
  "ProcessingTimeInMilliseconds": 1.515,
  "SearchablePDFURL": ""
}
1 Like

I’ve now had chance to test the new beta. I used the same file, but as you suggested I reduced the image size by 50%, and as another test reduced it by another 50%.
It gets a lot of it, but there are two big issues:

  1. In both cases it missed at least one character. Of course, the puzzle only works if all the characters are captured!
  2. Most importantly, I did check the “table recognition” box but the post-OCR text and JSON did not preserve the table nature. It’s critical that the table aspect is preserved.
    Is there anything I can do to get better results?
    Thank you.

About issue #2: Please try again. We released an update about a week ago that adds auto-rotation and table OCR support to engine 2. It seems you just missed it.

=> I did a quick test and the numbers are now returned line by line, as it should be.