Understanding OCR and Ways to Improve OCR Accuracy

Posted By :Hemant Chauhan |31st March 2020

 

Artificial intelligence is reinventing traditional data and image processing capabilities for businesses to extract valuable insights. As an experiential provider of AI development services, Oodles AI discusses the fundamentals of traditional Optical Character Recognition systems and how AI tools and applications are improving OCR accuracy

 

What is OCR?

OCR stands for Optical Character Recognition. An OCR engine is the software that is used to extract text from scanned images of physical documents. There are multiple open-source engines used to perform OCR such as Cloud vision and tesseract. Tesseract is the most accurate and most commonly used open-source OCR engine. 

Emerging providers of OCR systems are proactively using AI technologies such as computer vision services to optimize data extractions tasks. We, at Oodles, harness computer vision and natural language processing technologies to build dynamic OR systems for identity verification and healthcare services.  

 

 

 

 

Most OCR engine provides 96% - 98% accuracy at the page level. That means in a page of 100 words 96 – 98 words are accurate. OCR accuracy is measure by taking the output text of OCR results of an image and comparing it to the original image text. Sometimes OCR provides poor results because of the image quality is bad or image resolution is low

 

Here are the main points to improve OCR accuracy by processing your image-

 

Get perspective transform of an image

Using get perspective and warp perspective in Opencv library and python, we can easily change the geometric transformation of an image by detecting its edges using a canny edge detection feature. Here the transformation image is shown below-

 

 

 

Image quality and format is good( prefer tiff and png format)

If the image source quality is good then we get good OCR output. We take care of that the image is not hazy, it is important to use the cleanest image source. Accuracy also depends on image format if the image format is jpeg then sometimes it gives poor results. But if the image format is png or tiff or jpg than it improves OCR accuracy.

 

Cropping of an image

When we try to deploy OCR systems for an image that contains text in some area, then cropping is required. We crop only that part of the image which contains text, it increases the OCR accuracy of extracted data in compare of without cropped image.

 

Binarizing the Image

Binarization is used to convert colored images (RGB) to a black and white image. Use features of OpenCV library like Adaptive Thresholding, we can convert image to white and black. Most Ocr engine uses binarization internally. Here we see the binarization of an image-

 

 

 

Increase Contrast and Sharpness of the image

Increase the contrast and density of the image before practicing OCR. By increasing the contrast between the text/image and its background, it gives out more accuracy in the output. If the Sharpness of an image is good it gives more clarity in the text.

 

Increase Scanning Resolution

The Standard size of the image is scaled to at least 300 (DPI) Dots Per Inch. DPI lower than 200 will give unclear results while keeping the DPI above 600 will increase the size of the output file without much quality.

 

Rotate pages to the correct orientation(Deskew)

An image that is not straight is called a skewed image. De-skewing the image means to bring an image to correct orientation by rotating it. If the image is skewed to any side we have to do the following steps:

1. Detect the text block of  image.

2. Calculate the angle of rotation.

3. Rotate the image to correct skew.

Below is a demonstration of a deskewed image on the right side

 

 

 

Open-source Tools and Libraries for Image Processing

  • Leptonica
  • Tensorflow
  • OpenCV
  • Awt graphics2d

 


About Author

Hemant Chauhan

Hemant is an accomplished backend developer with extensive experience in software development. He possesses an in-depth understanding of various technologies and has a strong command over Java, Spring Boot, MySQL, Elasticsearch, Selenium with Java, GitHub/GitLab, HTML/CSS, and MongoDB. Hemant has worked on several related projects, including Tessaract OCR, Sikuli with Selenium Automation, Transleqo, and currently, SecureNow. He excels at managing trading bots, developing centralized exchanges, and has a creative mindset with exceptional analytical skills.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us