Feature Request - support jbig2 image format

Alistair_Oldfield · November 16, 2020, 1:02pm

Hello!

I would like to request a feature to support jbig2 image formats that provide lossless compression and are specifically optimized for 2-color (black and white) images. Perfect for scanned docs, particularly those which are preprocessed if available.

A 250kb PNG of a black and white image will be compressed to about 20k. - which I believe would be something interesting for both the API provider and consumers alike.

I understand that this format is supported by Tesseract, but even so, jb2 files can be easily decoded into png (losslessly) injected as part of the API workflow - at very high speeds (fractions of a second):

Example command which would convert a jb2 file into a PNG:
jbig2dec img_01.png.jb2 -o img_01.png.jb2.png

More info on the Ghostscript affiliated site: https://jbig2dec.com/
Would OCR.space consider including support for this format to API invocation payload sizes - or at least for the PAID version?

Currently, when uploading an image of a scanned page in jb2 format, the API responds with:

"ErrorMessage": [

    "File failed validation. File does not have a valid extension. Allowed file extensions: .pdf,.jpg,.png,.jpeg,.bmp,.gif,.tif,.tiff,.webp"

],

Thanks for your consideration!

admin · November 16, 2020, 11:19pm

Thanks a lot for this interesting feature suggestion!

Are you a PRO user?