ocr → Unstract.com

Best OCR Software in 2026 — A Tool Comparison & Evaluation Guide

Nuno Bispo — Wed, 05 Mar 2025 06:57:16 +0000

Introduction

Optical Character Recognition (OCR) technology has become indispensable in today’s digital landscape.

By converting various document types—such as scanned papers, PDF files, and images—into editable and searchable text, OCR software significantly enhances productivity and efficiency.

This is particularly crucial in industries that handle large volumes of documents, including finance, healthcare, legal, and education.

The effectiveness of OCR software can greatly impact workflows, data accuracy, and overall operational efficiency. As businesses increasingly rely on digital transformation, selecting the right OCR tool is more important than ever.

This article aims to provide a comprehensive overview of the best OCR software available in 2026. We will compare their features, capabilities, and performance to help you make an informed decision.

This article provides a comprehensive overview of the top OCR tools in 2026. We will compare:

1. Tesseract,
2. Paddle OCR,
3. Azure Document Intelligence
4. Amazon Textract
5. LLMWhisperer.

We will test each of the tools on two sample documents. We will evaluate and asses each of these tools based on:

1. Accuracy
2. Multi-language support
3. Complex layout handling
4. Structured data extraction
5. Deployment flexilbity
6. Ease of use
7. Cost
8. Custom training
9. Integration
10. Security and compliance.

Click here to jump directly to the tool evaluation section.

Understanding OCR Software

What is OCR Software?

OCR software is a powerful tool designed to interpret and convert different types of documents—such as scanned papers, PDF files, and images—into machine-readable text.

This technology is widely used in various applications, including data entry, archiving, document management, and digital transformation initiatives.

By converting physical or static digital documents into editable and searchable text, OCR software enables businesses to streamline workflows, improve data accessibility, and enhance overall operational efficiency.

How Does OCR Software Work?

OCR software works by analyzing the shapes and patterns of characters in document images and converting them into corresponding text characters.

The process typically involves several steps:

Image Pre-processing:

Enhancement: Improving the quality of the document image by adjusting brightness, contrast, and resolution.
Noise Reduction: Removing any unwanted artifacts or distortions that could interfere with text recognition.
Binarization: Converting the image to a binary format (black and white) to simplify the recognition process.

Character Recognition:

Segmentation: Identifying and separating individual characters, words, and lines of text within the image.
Pattern Matching: Comparing the shapes of characters in the image with a predefined set of character templates to determine the corresponding text.
Machine Learning: Utilizing advanced algorithms and neural networks to improve recognition accuracy, especially for complex or handwritten text.

Post-processing:

Error Correction: Identifying and correcting any errors in the recognized text, such as misspelled words or incorrect characters.
Formatting: Applying appropriate formatting to the text, including paragraphs, headings, and tables, to maintain the original document structure.
Validation: Verifying the accuracy of the recognized text through various checks, such as dictionary lookups and context analysis.

LLMWhisperer: The Best OCR for LLMs

If your solution involves using Large Language Models(LLMs) to process and extract document data:

LLMs are powerful, but their output is as good as the input you provide. Documents can be a mess: widely varying formats and encodings, scans of images, numbered sections, and complex tables.

LLMWhisperer is a technology that presents data from complex documents to LLMs in a way they’re able to best understand it.

If you want to quickly take it for test drive, you can checkout our free playground.

Key Features to Look for in OCR Software

When selecting OCR software, it’s important to consider the following features to ensure it meets your specific needs:

Accuracy: The ability to correctly recognize and convert text, even in challenging conditions such as low-quality images, handwritten text, or complex layouts.
Language Support: Compatibility with multiple languages, including support for special characters, accents, and non-Latin text.
Speed: The processing speed, especially for large documents or high-volume workflows, to ensure efficient and timely conversion.
Ease of Integration: Ability to integrate seamlessly with other software systems, such as document management systems, enterprise resource planning (ERP) systems, and customer relationship management (CRM) platforms.
Cost: Evaluating whether the software is free or paid, and assessing its cost-effectiveness based on your specific requirements and budget.
User Interface: The ease of use and intuitiveness of the software’s interface, ensuring that users can quickly learn and effectively utilize its features.
Customization: The ability to customize the software to meet specific needs, such as adding custom dictionaries, training the OCR engine with specific fonts, or configuring workflows.
Security: Ensuring that the software provides robust security measures to protect sensitive data, including encryption, access controls, and compliance with industry standards.

Comparing Locally Runnable OCR Software

For reference, the sample documents used to test the different libraries are these:

Document 1: bill-of-sale.pdf
Document 2: nasa-apollo.pdf

Test documents for comparing OCR (Optical Character Recognition) tool capabilities

All the source code for this OCR tool evaluation project can be found here on Github.

Tesseract OCR

Tesseract OCR is an open-source tool for recognizing text in images. It was first created by Hewlett-Packard and later made open-source by Google. It is known for being very accurate and flexible, which makes it one of the most popular OCR tools available.

Main Features and Capabilities:

Neural Network-Based Recognition: Tesseract 4 introduced a new OCR engine based on Long Short-Term Memory (LSTM) neural networks, which greatly improves accuracy, especially for recognizing lines of text. It still works with the older OCR engine from Tesseract 3.
Broad Language Support: Tesseract can recognize over 100 languages right out of the box, including support for Unicode (UTF-8), which covers a wide range of character sets.
Diverse Output Formats: It supports various output formats such as plain text, hOCR (HTML), PDF, TSV, ALTO, and PAGE, making it adaptable for different uses.
Image Format Compatibility: Tesseract can process various image formats including PNG, JPEG, and TIFF. However, to get the best OCR results, the quality of the input image often needs to be improved.
No Built-In GUI: Tesseract primarily operates via the command line, but there are several third-party graphical user interfaces (GUIs) available for those who prefer a visual interface.
Custom Training Capabilities: Users can train Tesseract to recognize new languages or improve their accuracy with specific fonts or handwriting styles. This makes it highly customizable for specialized OCR needs.

Installation and Setup Process

Here’s how you can install and set up Tesseract OCR using Python:

First, you need to make sure that you have the Tesseract OCR library installed, details here.
Then you can install a Python wrapper, in this case will use tesserocr, more details about the wrapper are here. For installation requirements of the wrapper, you can check the official GitHub page.

Since Tesseract OCR cannot process PDF files directly, we need to first extract the contents of the sample PDF files into images.

For this, we will use the library pypdfium2 and also the PIL library for additional image support:

pip install pillow pypdfium2

Then we can write the following code to perform the OCR from the sample documents:

import tesserocr
import pypdfium2 as pdfium


# Load a document
pdf = pdfium.PdfDocument("bill-of-sale.pdf")


# Loop over pages and render
for i in range(len(pdf)):
   page = pdf[i]
   image = page.render(scale=5).to_pil()
   print(tesserocr.image_to_text(image))

Here’s a description of what the code does:

The script loads a PDF document named “bill-of-sale.pdf” using pdfium.PdfDocument.
It then loops over each page in the PDF:
- The page is rendered as an image with a scale factor of 5 (to increase the resolution) using page.render(scale=5).to_pil(). The to_pil() method converts the rendered image to a PIL (Pillow) image object.

The tesserocr.image_to_text(image) function is used to perform OCR on the image and extract text, which is then printed to the console.

Performance Test

We tested Tesseract against the two sample documents. Here are the results:

Document 1: Medium accuracy, missed several of the items lines of the bill of sale.

PION OF SPALDING SALES CORPORATION

FAMOUS FOR ATHLETIC EQUIPMENT

Sold To 6/1/39

BKLYN EAGLES 8 B CLUB

EMANLEY -
101 MONTGOMERY STREET

NEWARK, N.J.

DATE

ADDRESS

TOWN STATE

TERMS:
2% CASH [0 DAYS-NET 30 DAYS-

PLEASE REMIT TO SPALDING SALES CORP,

a oi ¢

STORE NO.

INVOICE No. S 2812

CUSTOMER'S
ORDER NO.

DELD TQ DIRK LUNDY

Ship To

ADDRESS

TOWN

VIA

Class | OEALER

Inst, |GOLF PRO,| ORDER TAKEN BY

-- -- --

NO,

SALESMAN'S NAME

OF

BALE GOODWIN

TAGUER

RB ITEM

NO DESCRIPTION OF ARTICLE

125
120
200

130

Form F 71 139-M

3 | 106

AMOUNT

QUANTITY
ORDERED | SHIPPED

15| 75
175
00

80
30

xe

bie

6
2
26
/
Y

NO RETURN OF MERCHANDISE FROM THIS INVOICE WILL BE ACCEPTED UNLESS YOU HAVE OUR WRITTEN PERMISSION.

Document 2: High accuracy, extracted almost completely the total information from the document

Using a 10 day interval, the values in the third column will be 10 times
the mean motion, n, and those in the fourth column will be 200 times the
acceleration coefficient. As outlined in vrevious I1TCP Bulletins, predictions
may ve made by interpolation in sucn a table. Its form is comparable to that
of the Daily Satellite Ephenmerides.

The prediction procedure involves the calculation of a time, JNL, at
which the satellite is expected to appear at a particular position such as
the point of local culmination. For this position, we will have calculated
a value of PRY. The observation then represents a measurement of the actual
time at which the satellite appeared at, or very near this position. As &
meticulous observer, I will have measured this time to + ofo0000001. I will
then use this actual time to recalculate the position and will correct for
differences between the calculated and actual point of observation. I will
also make corrections for effects of the pear shape and the ellipticity of
the equator. All of this is done with the objective of obtaining a measured
value of PRM that is accurate to about + 0%00001.

Another observer, Mr. X, may either be less meticulous or may lack the
means for making precise observations. l!e may simply use the predicted value
of PRM as a measured value to correspond with his measurement of the time.

Ge can assume that his accuracy is + of00001 in timing and + otoot in PRM.

Let us then assume that Mr. X and I both start tracking 1960 Nu 2, using

the above table and the Gear Ratio Elements. For convenience, we will also
assume that his location is the same as mine and that he makes each observation

at the same time that I do. After about 60 days, our records of observations
night compare as shown in Table V.

TABLE V

RECORD OF OBSERVATIONS

Comparin of a Meticulous Observer (W.P.O
with of a Casual Observer -X
Predicted Wie P.O. Kr. X
JNL PRM PRN st«éRR@SAd PRY. Resid.
58462.40113315  16699.946438  16899.946518 +80  16899.9469 +400
3848205088329 171656472894 171654473397 = 4503 1716544752 = +300
38484.04727861  17192.450126 17192450724 4598 1719264512 +1100
3848508494712 172060472112 172066472747 = +635 172064728 = +700
38486 604369964 17219427707 172194428590 4683 1721924280 +300
38494 .02957430 173272340616 175276341718 = +1102 1732723416 +1000
38495 .06723367 173410362480 = 173410565653 «+1173 --17341.5642 = +1700
38497 006365582 173684340068 173682341367 +1299 =: 1736823410 = +900

Wie can assume thet Mr. X's values of JNL will be the same as mine (in the first

column) except that they will include only five digits to the right of the

decimal.

In the above table, each "residual" column represents the difference

between the observed valuesand the predicted values, which are in the second
column. As we proceed, Mr. X and I will both plot these residuals against time,

as shown in Figure 2.

12

Paddle OCR

PaddleOCR is a free, open-source toolkit for OCR created by the PaddlePaddle community.

It’s designed to be practical and lightweight, supporting over 80 languages, which makes it very versatile for many different OCR uses.

Built on the advanced deep learning framework PaddlePaddle, it offers strong tools for training, deploying, and integrating OCR models on various platforms like servers, mobile devices, embedded systems, and IoT devices.

Key Features of PaddleOCR:

Multilingual Support: PaddleOCR can recognize over 80 languages, including complex ones like Chinese, Japanese, and Korean. This makes it a global solution for OCR needs.
Comprehensive Tools: The toolkit provides a full set of tools for OCR tasks, from labeling data and training models to compressing models and deploying them. This includes high-quality models like PP-OCR, PP-Structure, and PP-ChatOCR, which are optimized for different OCR tasks.
Deployment Flexibility: PaddleOCR is designed to be easily used on various platforms. Whether you’re working on a server, a mobile app, or an embedded system, PaddleOCR has the tools and support to integrate OCR capabilities efficiently.
Lightweight and Fast: One of the main goals of PaddleOCR is to stay lightweight and fast without losing accuracy. This makes it great for real-time applications where performance is important.
Community and Documentation: PaddleOCR has an active community behind it, offering lots of documentation and resources to help developers get started and solve problems. The project is licensed under Apache License 2.0, which encourages its wide use and contributions.

Installation and Setup Process

To install Paddle OCR, the following command can be used:

pip install paddlepaddle paddleocr

This will install Paddle OCR with CPU support. If we want to use GPU support, we can run:

pip install paddlepaddle-gpu paddleocr

Then we can write the following code:

from paddleocr import PaddleOCR, draw_ocr
import pypdfium2 as pdfium
from PIL import Image

# Load a document
pdf = pdfium.PdfDocument("bill-of-sale.pdf")

# Initialize PaddleOCR and load English model
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Loop over pages and render
for i in range(len(pdf)):
    page = pdf[i]
    image = page.render(scale=5).to_pil()
    image.save(f'image{i}.jpg')

    # OCR the image
    result = ocr.ocr(f'image{i}.jpg', cls=True)
    for idx in range(len(result)):
        res = result[idx]
        # Print the OCR result
        for line in res:
            print(line)

    # Optionally, draw the OCR result on the image
    result = result[0]
    image = Image.open(f'image{i}.jpg').convert('RGB')
    boxes = [line[0] for line in result]
    texts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, texts, scores, font_path='C:\\\\Windows\\\\Fonts\\\\Arial.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save(f'result{i}.jpg')

Performance Test

We tested PaddleOCR on the same two sample documents:

Document 1: High accuracy, extracted almost completely the total information from the document

[[[361.0, 158.0], [1389.0, 178.0], [1385.0, 362.0], [357.0, 341.0]], ('HG.spalding+Bros.', 0.8403503894805908)]
[[[2094.0, 267.0], [3152.0, 267.0], [3152.0, 314.0], [2094.0, 314.0]], ('PLEASE REMIT TO SPALDING SALES CORP.', 0.9451443552970886)]
[[[412.0, 317.0], [950.0, 310.0], [950.0, 357.0], [413.0, 364.0]], ('ION OF SPAIDING SALES', 0.8258334398269653)]
[[[950.0, 321.0], [1241.0, 321.0], [1241.0, 357.0], [950.0, 357.0]], ('CORPORATION', 0.95151287317276)]
[[[2320.0, 480.0], [2406.0, 480.0], [2406.0, 555.0], [2320.0, 555.0]], ('#1', 0.9487408995628357)]
[[[2570.0, 502.0], [2827.0, 493.0], [2830.0, 573.0], [2573.0, 581.0]], ('FOLIOC', 0.916832685470581)]
[[[1968.0, 526.0], [2237.0, 526.0], [2237.0, 577.0], [1968.0, 577.0]], ('STORE NO.', 0.9377260208129883)]
[[[269.0, 653.0], [1524.0, 653.0], [1524.0, 703.0], [269.0, 703.0]], ('FAMOUSFOR ATHLETIC EQUIPMENT', 0.9647388458251953)]
[[[2370.0, 682.0], [2982.0, 674.0], [2983.0, 746.0], [2371.0, 754.0]], ('INVOICE NO.S 2812', 0.9267123341560364)]
[[[197.0, 800.0], [534.0, 800.0], [534.0, 891.0], [197.0, 891.0]], ('Sold To', 0.9875026941299438)]
[[[1760.0, 793.0], [2108.0, 793.0], [2108.0, 898.0], [1760.0, 898.0]], ('Ship To', 0.9982296228408813)]
[[[2381.0, 797.0], [2707.0, 797.0], [2707.0, 844.0], [2381.0, 844.0]], ("CUSTOMER'S", 0.9869886636734009)]
[[[1107.0, 814.0], [1342.0, 823.0], [1339.0, 906.0], [1103.0, 897.0]], ('6/1/39', 0.9966378211975098)]
[[[810.0, 840.0], [950.0, 840.0], [950.0, 891.0], [810.0, 891.0]], ('DATE', 0.9967975616455078)]
[[[2374.0, 836.0], [2661.0, 840.0], [2660.0, 891.0], [2373.0, 887.0]], ('ORDER NO.', 0.9327320456504822)]
[[[2108.0, 945.0], [2789.0, 941.0], [2790.0, 1002.0], [2108.0, 1006.0]], ('DELD TO DIRK LUNDY', 0.9053958058357239)]
[[[505.0, 956.0], [1308.0, 948.0], [1309.0, 1009.0], [506.0, 1017.0]], ('BKLYN EAGLES BB CLUB', 0.9151082038879395)]
[[[509.0, 1028.0], [817.0, 1028.0], [817.0, 1089.0], [509.0, 1089.0]], ('EMANLEY-', 0.9936221241950989)]
[[[1717.0, 1060.0], [1958.0, 1060.0], [1958.0, 1111.0], [1717.0, 1111.0]], ('ADDRESS', 0.9848356246948242)]
[[[204.0, 1071.0], [437.0, 1071.0], [437.0, 1111.0], [204.0, 1111.0]], ('ADDRESS', 0.9969754815101624)]
[[[513.0, 1096.0], [1305.0, 1096.0], [1305.0, 1147.0], [513.0, 1147.0]], ('1O1 MONTGOMERY STREET', 0.9420073628425598)]
[[[503.0, 1157.0], [1047.0, 1165.0], [1046.0, 1230.0], [502.0, 1222.0]], ('NEWARKN.J.', 0.9448561668395996)]
[[[1020.0, 1182.0], [1202.0, 1191.0], [1199.0, 1245.0], [1017.0, 1236.0]], ('STATE', 0.9939748644828796)]
[[[1712.0, 1182.0], [1873.0, 1191.0], [1870.0, 1245.0], [1709.0, 1236.0]], ('TOWN', 0.9983097314834595)]
[[[197.0, 1193.0], [355.0, 1193.0], [355.0, 1244.0], [197.0, 1244.0]], ('TOWN', 0.9977961778640747)]
[[[2553.0, 1190.0], [2718.0, 1190.0], [2718.0, 1240.0], [2553.0, 1240.0]], ('STATE', 0.9975801706314087)]
[[[201.0, 1258.0], [394.0, 1258.0], [394.0, 1309.0], [201.0, 1309.0]], ('TERMS:', 0.978973925113678)]
[[[1710.0, 1305.0], [1803.0, 1305.0], [1803.0, 1359.0], [1710.0, 1359.0]], ('VIA', 0.9877907633781433)]
[[[197.0, 1316.0], [1047.0, 1305.0], [1047.0, 1356.0], [198.0, 1367.0]], ('2%CASH 10DAYS-NET3ODAYS-', 0.9155265688896179)]
[[[437.0, 1439.0], [610.0, 1439.0], [610.0, 1478.0], [437.0, 1478.0]], ('DEALER', 0.9933333992958069)]
[[[670.0, 1435.0], [1058.0, 1435.0], [1058.0, 1482.0], [670.0, 1482.0]], ('INST.GOLF PRO', 0.9222421050071716)]
[[[1251.0, 1435.0], [1624.0, 1435.0], [1624.0, 1482.0], [1251.0, 1482.0]], ('ORDER TAKEN BY', 0.9442328810691833)]
[[[2073.0, 1435.0], [2467.0, 1439.0], [2467.0, 1478.0], [2072.0, 1475.0]], ("SALESMAN'S NAME", 0.9187303781509399)]
[[[2926.0, 1435.0], [3012.0, 1435.0], [3012.0, 1489.0], [2926.0, 1489.0]], ('NO,', 0.9488022923469543)]
[[[233.0, 1460.0], [366.0, 1460.0], [366.0, 1500.0], [233.0, 1500.0]], ('CLASS', 0.9968454241752625)]
[[[262.0, 1511.0], [337.0, 1511.0], [337.0, 1550.0], [262.0, 1550.0]], ('OF', 0.9905343055725098)]
[[[2904.0, 1511.0], [3026.0, 1511.0], [3026.0, 1583.0], [2904.0, 1583.0]], ('106', 0.9961721897125244)]
[[[1198.0, 1529.0], [1466.0, 1529.0], [1466.0, 1590.0], [1198.0, 1590.0]], ('GOODWIN', 0.9962643384933472)]
[[[1879.0, 1522.0], [2112.0, 1522.0], [2112.0, 1586.0], [1879.0, 1586.0]], ('TAGUER', 0.9955328106880188)]
[[[778.0, 1536.0], [821.0, 1536.0], [821.0, 1590.0], [778.0, 1590.0]], ('A', 0.9980504512786865)]
[[[247.0, 1554.0], [359.0, 1554.0], [359.0, 1594.0], [247.0, 1594.0]], ('SALE', 0.8504327535629272)]
[[[1897.0, 1637.0], [2101.0, 1637.0], [2101.0, 1677.0], [1897.0, 1677.0]], ('QUANTITY', 0.9966365694999695)]
[[[893.0, 1659.0], [1431.0, 1659.0], [1431.0, 1705.0], [893.0, 1705.0]], ('DESCRIPTION OF ARTICLE', 0.9516414999961853)]
[[[2237.0, 1659.0], [2348.0, 1659.0], [2348.0, 1709.0], [2237.0, 1709.0]], ('UNIT', 0.9896126985549927)]
[[[2546.0, 1659.0], [2678.0, 1659.0], [2678.0, 1709.0], [2546.0, 1709.0]], ('PRICE', 0.9961897730827332)]
[[[2890.0, 1659.0], [3073.0, 1659.0], [3073.0, 1709.0], [2890.0, 1709.0]], ('AMOUNT', 0.9908974170684814)]
[[[319.0, 1687.0], [405.0, 1687.0], [405.0, 1727.0], [319.0, 1727.0]], ('NO,', 0.849390983581543)]
[[[1786.0, 1687.0], [2201.0, 1687.0], [2201.0, 1734.0], [1786.0, 1734.0]], ('ORDERED OHIPPED', 0.8947080373764038)]
[[[2277.0, 1893.0], [2370.0, 1893.0], [2370.0, 1969.0], [2277.0, 1969.0]], ('EA', 0.9961373209953308)]
[[[2896.0, 1884.0], [2986.0, 1899.0], [2973.0, 1979.0], [2883.0, 1964.0]], ('15', 0.9982523322105408)]
[[[158.0, 1904.0], [269.0, 1904.0], [269.0, 1979.0], [158.0, 1979.0]], ('125', 0.9140155911445618)]
[[[566.0, 1904.0], [731.0, 1904.0], [731.0, 1969.0], [566.0, 1969.0]], ('BATS', 0.9960408806800842)]
[[[1864.0, 1904.0], [1915.0, 1904.0], [1915.0, 1979.0], [1864.0, 1979.0]], ('9', 0.9854689240455627)]
[[[2094.0, 1904.0], [2144.0, 1904.0], [2144.0, 1979.0], [2094.0, 1979.0]], ('9', 0.5181447863578796)]
[[[2660.0, 1897.0], [2750.0, 1897.0], [2750.0, 1976.0], [2660.0, 1976.0]], ('75', 0.9985846281051636)]
[[[3005.0, 1897.0], [3091.0, 1897.0], [3091.0, 1976.0], [3005.0, 1976.0]], ('75', 0.9977871179580688)]
[[[2592.0, 1907.0], [2642.0, 1907.0], [2642.0, 1958.0], [2592.0, 1958.0]], ('1', 0.9921799898147583)]
[[[1861.0, 2008.0], [1922.0, 2008.0], [1922.0, 2077.0], [1861.0, 2077.0]], ('1', 0.9759079217910767)]
[[[2277.0, 2005.0], [2370.0, 2005.0], [2370.0, 2077.0], [2277.0, 2077.0]], ('EA', 0.9966272115707397)]
[[[3005.0, 2001.0], [3094.0, 2001.0], [3094.0, 2084.0], [3005.0, 2084.0]], ('75', 0.9976346492767334)]
[[[154.0, 2012.0], [272.0, 2012.0], [272.0, 2080.0], [154.0, 2080.0]], ('120', 0.9948965907096863)]
[[[566.0, 2012.0], [731.0, 2012.0], [731.0, 2077.0], [566.0, 2077.0]], ('BATS', 0.9785612225532532)]
[[[2090.0, 2012.0], [2137.0, 2012.0], [2137.0, 2073.0], [2090.0, 2073.0]], ('1', 0.948710024356842)]
[[[2926.0, 2012.0], [2972.0, 2012.0], [2972.0, 2073.0], [2926.0, 2073.0]], ('1', 0.9829316735267639)]
[[[1864.0, 2127.0], [1911.0, 2127.0], [1911.0, 2196.0], [1864.0, 2196.0]], ('6', 0.958515465259552)]
[[[2094.0, 2124.0], [2144.0, 2124.0], [2144.0, 2199.0], [2094.0, 2199.0]], ('9', 0.6393207907676697)]
[[[2277.0, 2127.0], [2370.0, 2127.0], [2370.0, 2203.0], [2277.0, 2203.0]], ('EA', 0.9971687197685242)]
[[[2653.0, 2127.0], [2746.0, 2127.0], [2746.0, 2199.0], [2653.0, 2199.0]], ('00', 0.9670082926750183)]
[[[154.0, 2135.0], [269.0, 2135.0], [269.0, 2203.0], [154.0, 2203.0]], ('200', 0.9990455508232117)]
[[[2592.0, 2138.0], [2646.0, 2138.0], [2646.0, 2192.0], [2592.0, 2192.0]], ('1', 0.8707045316696167)]
[[[3005.0, 2131.0], [3091.0, 2131.0], [3091.0, 2199.0], [3005.0, 2199.0]], ('00', 0.9527323246002197)]
[[[570.0, 2142.0], [728.0, 2142.0], [728.0, 2196.0], [570.0, 2196.0]], ('BATS', 0.9951581358909607)]
[[[3005.0, 2239.0], [3094.0, 2239.0], [3094.0, 2315.0], [3005.0, 2315.0]], ('80', 0.9814289808273315)]
[[[158.0, 2254.0], [262.0, 2254.0], [262.0, 2329.0], [158.0, 2329.0]], ('130', 0.9975236058235168)]
[[[566.0, 2254.0], [728.0, 2254.0], [728.0, 2318.0], [566.0, 2318.0]], ('BATS', 0.9968507289886475)]
[[[1864.0, 2254.0], [1911.0, 2254.0], [1911.0, 2315.0], [1864.0, 2315.0]], ('2', 0.9978007674217224)]
[[[2097.0, 2254.0], [2144.0, 2254.0], [2144.0, 2315.0], [2097.0, 2315.0]], ('2', 0.9970558881759644)]
[[[2280.0, 2254.0], [2366.0, 2254.0], [2366.0, 2322.0], [2280.0, 2322.0]], ('EA', 0.9969485998153687)]
[[[2592.0, 2257.0], [2646.0, 2257.0], [2646.0, 2308.0], [2592.0, 2308.0]], ('1', 0.873385488986969)]
[[[2660.0, 2254.0], [2739.0, 2254.0], [2739.0, 2318.0], [2660.0, 2318.0]], ('40', 0.9988534450531006)]
[[[2929.0, 2250.0], [2972.0, 2250.0], [2972.0, 2311.0], [2929.0, 2311.0]], ('2', 0.9969152212142944)]
[[[2886.0, 2329.0], [2972.0, 2329.0], [2972.0, 2405.0], [2886.0, 2405.0]], ('26', 0.9959961175918579)]
[[[3005.0, 2340.0], [3091.0, 2340.0], [3091.0, 2409.0], [3005.0, 2409.0]], ('30', 0.9916164875030518)]
[[[2642.0, 2369.0], [2729.0, 2369.0], [2729.0, 2448.0], [2642.0, 2448.0]], ('er', 0.551166296005249)]
[[[2987.0, 2405.0], [3094.0, 2405.0], [3094.0, 2477.0], [2987.0, 2477.0]], ('50', 0.5353686213493347)]
[[[2919.0, 2416.0], [2962.0, 2416.0], [2962.0, 2473.0], [2919.0, 2473.0]], ('/', 0.5241718888282776)]
[[[2854.0, 2506.0], [2965.0, 2506.0], [2965.0, 2564.0], [2854.0, 2564.0]], ('24', 0.64298415184021)]
[[[1483.0, 2645.0], [1874.0, 2681.0], [1866.0, 2772.0], [1475.0, 2736.0]], ('SEP1', 0.9770590662956238)]
[[[262.0, 3050.0], [588.0, 3050.0], [588.0, 3097.0], [262.0, 3097.0]], ('FormF21139-M', 0.9208917021751404)]
[[[477.0, 3112.0], [2926.0, 3112.0], [2926.0, 3159.0], [477.0, 3159.0]], ('NO RETURN OF MERCHANDISE FROM THIS INVOICE WILL BE ACCEPTED UNLESS YOU HAVE OUR WRITTEN PERMISSION.', 0.9201523065567017)]

Document 2: High accuracy, extracted almost completely the total information from the document

[[[553.0, 311.0], [2408.0, 298.0], [2408.0, 359.0], [554.0, 373.0]], ('Using a 10 day interval, the values in the third colurn will be 10 times', 0.9857205152511597)]
[[[434.0, 417.0], [2430.0, 403.0], [2431.0, 460.0], [434.0, 474.0]], ('acceleration coefficient. As outlined in previous ITcp Bulletins, predictions', 0.9794966578483582)]
[[[434.0, 469.0], [2404.0, 452.0], [2404.0, 513.0], [434.0, 531.0]], ('may be made by interpolation in such a table. Its form is comparable to that', 0.9822374582290649)]
[[[438.0, 517.0], [1346.0, 517.0], [1346.0, 574.0], [438.0, 574.0]], ('of the Daily Satellite Ephenerides.', 0.9848372340202332)]
[[[562.0, 632.0], [2315.0, 623.0], [2316.0, 684.0], [562.0, 693.0]], ('The prediction procedure involves the calculation of a time, JNL, at', 0.99403977394104)]
[[[438.0, 689.0], [2337.0, 675.0], [2338.0, 732.0], [439.0, 746.0]], ('which the satellite is expected to appear at a particular position such as', 0.9948274493217468)]
[[[438.0, 895.0], [2182.0, 881.0], [2183.0, 938.0], [439.0, 952.0]], ('meticulous observer, I will have measured this time to + Odoooooo1.', 0.9829939603805542)]
[[[2183.0, 886.0], [2355.0, 886.0], [2355.0, 930.0], [2183.0, 930.0]], ('I will', 0.996565580368042)]
[[[443.0, 943.0], [2342.0, 930.0], [2342.0, 991.0], [443.0, 1004.0]], ('then use this actual time to recalculate the position and will correct for', 0.9911159873008728)]
[[[447.0, 996.0], [2165.0, 987.0], [2165.0, 1044.0], [447.0, 1053.0]], ('differences between the calculated and actual point of observation..', 0.9875534176826477)]
[[[2200.0, 991.0], [2355.0, 991.0], [2355.0, 1035.0], [2200.0, 1035.0]], ('I will', 0.9729122519493103)]
[[[438.0, 1048.0], [2346.0, 1035.0], [2347.0, 1092.0], [439.0, 1105.0]], ("also make corrections for effects of' the pear shape and the ellipticity of.", 0.9852278828620911)]
[[[757.0, 1096.0], [2390.0, 1087.0], [2391.0, 1144.0], [757.0, 1153.0]], ('All of this is done with the objective of obtaining a measured', 0.9988162517547607)]
[[[452.0, 1110.0], [775.0, 1110.0], [775.0, 1153.0], [452.0, 1153.0]], ('the equator..', 0.9692018032073975)]
[[[566.0, 1250.0], [2368.0, 1237.0], [2369.0, 1298.0], [567.0, 1311.0]], ('Another observer, Mr. X, may either be less meticulous or may lack the', 0.9877793192863464)]
[[[438.0, 1303.0], [2417.0, 1289.0], [2417.0, 1346.0], [439.0, 1360.0]], ('means for making precise observations. He may simply use the predicted value', 0.9835277199745178)]
[[[438.0, 1351.0], [2337.0, 1337.0], [2338.0, 1399.0], [439.0, 1412.0]], ('of PRt as a measured value to correspond with his measurement of the time.', 0.9732710123062134)]
[[[438.0, 1403.0], [2342.0, 1390.0], [2342.0, 1447.0], [439.0, 1460.0]], ('Fhdd ut loogo+ pue 3ugngg ut loooopo+ st Ronannn sty goyn aunsse uuo a!', 0.5953871011734009)]
[[[443.0, 1452.0], [2297.0, 1443.0], [2298.0, 1500.0], [443.0, 1509.0]], ('Let us then assune that lr. X and I both start tracking 1960 Nu 2, using', 0.97115159034729)]
[[[1447.0, 1553.0], [2470.0, 1543.0], [2471.0, 1601.0], [1448.0, 1610.0]], (' mine and that he makes each observation', 0.9914621114730835)]
[[[447.0, 1566.0], [1470.0, 1557.0], [1470.0, 1601.0], [447.0, 1610.0]], ('assume that his location is the same as.', 0.9801846742630005)]
[[[443.0, 1658.0], [1328.0, 1653.0], [1328.0, 1715.0], [443.0, 1719.0]], ('might conpare as shown in Table V.', 0.9593335390090942)]
[[[1386.0, 1833.0], [1616.0, 1833.0], [1616.0, 1881.0], [1386.0, 1881.0]], ('TABLE V', 0.9975382089614868)]
[[[1142.0, 1930.0], [1846.0, 1930.0], [1846.0, 1987.0], [1142.0, 1987.0]], ('RECORD OF OBSERVATIONS', 0.9988337159156799)]
[[[925.0, 2035.0], [2085.0, 2030.0], [2085.0, 2087.0], [925.0, 2092.0]], ('Comparing Log of a Meticulous Observer (W.P.O)', 0.9923900961875916)]
[[[1045.0, 2088.0], [1965.0, 2083.0], [1966.0, 2144.0], [1045.0, 2149.0]], ('with Log of a Casual Observer (Mr. X)', 0.9935088157653809)]
[[[956.0, 2237.0], [1195.0, 2237.0], [1195.0, 2285.0], [956.0, 2285.0]], ('Predicted', 0.9988049268722534)]
[[[1492.0, 2237.0], [1704.0, 2237.0], [1704.0, 2285.0], [1492.0, 2285.0]], ('W. P. O.', 0.8736283183097839)]
[[[2054.0, 2232.0], [2191.0, 2232.0], [2191.0, 2280.0], [2054.0, 2280.0]], ('Mr. X', 0.9575027227401733)]
[[[1032.0, 2289.0], [1124.0, 2289.0], [1124.0, 2342.0], [1032.0, 2342.0]], ('PRM', 0.9898760318756104)]
[[[1974.0, 2285.0], [2063.0, 2285.0], [2063.0, 2333.0], [1974.0, 2333.0]], ('PR', 0.9861990213394165)]
[[[2225.0, 2283.0], [2382.0, 2265.0], [2388.0, 2326.0], [2231.0, 2344.0]], ('Resid.', 0.9992003440856934)]
[[[633.0, 2294.0], [717.0, 2294.0], [717.0, 2346.0], [633.0, 2346.0]], ('JNL', 0.8906293511390686)]
[[[1426.0, 2294.0], [1505.0, 2294.0], [1505.0, 2329.0], [1426.0, 2329.0]], ('PRM', 0.9811756014823914)]
[[[1690.0, 2295.0], [1848.0, 2284.0], [1852.0, 2336.0], [1694.0, 2347.0]], ('Resid.', 0.9998622536659241)]
[[[1749.0, 2368.0], [1842.0, 2368.0], [1842.0, 2421.0], [1749.0, 2421.0]], ('+80', 0.9984487891197205)]
[[[1899.0, 2359.0], [2169.0, 2359.0], [2169.0, 2421.0], [1899.0, 2421.0]], ('16899.9469', 0.9996829032897949)]
[[[2262.0, 2359.0], [2373.0, 2359.0], [2373.0, 2412.0], [2262.0, 2412.0]], ('+400', 0.9191010594367981)]
[[[505.0, 2377.0], [868.0, 2377.0], [868.0, 2434.0], [505.0, 2434.0]], ('38462.40113315', 0.9995452165603638)]
[[[934.0, 2377.0], [1248.0, 2377.0], [1248.0, 2425.0], [934.0, 2425.0]], ('16899.946438', 0.9993425011634827)]
[[[1315.0, 2373.0], [1629.0, 2373.0], [1629.0, 2421.0], [1315.0, 2421.0]], ('16899.946518', 0.9998354911804199)]
[[[1731.0, 2473.0], [1837.0, 2473.0], [1837.0, 2526.0], [1731.0, 2526.0]], ('+503', 0.9993162155151367)]
[[[1899.0, 2465.0], [2174.0, 2465.0], [2174.0, 2526.0], [1899.0, 2526.0]], ('17165.4732', 0.9999091029167175)]
[[[2267.0, 2465.0], [2377.0, 2465.0], [2377.0, 2517.0], [2267.0, 2517.0]], ('+300', 0.9997279644012451)]
[[[509.0, 2487.0], [868.0, 2487.0], [868.0, 2530.0], [509.0, 2530.0]], ('38482.05088329', 0.9956350922584534)]
[[[920.0, 2478.0], [1252.0, 2468.0], [1254.0, 2530.0], [922.0, 2540.0]], ('17165.472894', 0.9996970295906067)]
[[[1315.0, 2478.0], [1629.0, 2478.0], [1629.0, 2526.0], [1315.0, 2526.0]], ('17165.473397', 0.9995167851448059)]
[[[1731.0, 2574.0], [1842.0, 2574.0], [1842.0, 2627.0], [1731.0, 2627.0]], ('+598', 0.9998121857643127)]
[[[1899.0, 2565.0], [2178.0, 2565.0], [2178.0, 2627.0], [1899.0, 2627.0]], ('17192.4512', 0.9999018907546997)]
[[[500.0, 2579.0], [867.0, 2574.0], [868.0, 2635.0], [501.0, 2640.0]], ('38484.04727861', 0.9995495080947876)]
[[[1319.0, 2579.0], [1634.0, 2579.0], [1634.0, 2627.0], [1319.0, 2627.0]], ('17192.450724', 0.9986034035682678)]
[[[2240.0, 2570.0], [2377.0, 2570.0], [2377.0, 2618.0], [2240.0, 2618.0]], ('+1100', 0.9997634887695312)]
[[[934.0, 2583.0], [1248.0, 2583.0], [1248.0, 2631.0], [934.0, 2631.0]], ('17192.450126', 0.9997585415840149)]
[[[1319.0, 2679.0], [1629.0, 2679.0], [1629.0, 2728.0], [1319.0, 2728.0]], ('17206.472747', 0.9998332858085632)]
[[[1735.0, 2675.0], [1842.0, 2675.0], [1842.0, 2728.0], [1735.0, 2728.0]], ('+635', 0.999264121055603)]
[[[1908.0, 2675.0], [2174.0, 2675.0], [2174.0, 2723.0], [1908.0, 2723.0]], ('17206.4728', 0.9994878768920898)]
[[[2267.0, 2671.0], [2382.0, 2671.0], [2382.0, 2723.0], [2267.0, 2723.0]], ('+700', 0.9996985197067261)]
[[[500.0, 2684.0], [872.0, 2679.0], [872.0, 2741.0], [501.0, 2746.0]], ('38485.08494712', 0.9998096823692322)]
[[[939.0, 2684.0], [1253.0, 2684.0], [1253.0, 2732.0], [939.0, 2732.0]], ('17206.472112', 0.9997091889381409)]
[[[1727.0, 2761.0], [1853.0, 2774.0], [1846.0, 2839.0], [1720.0, 2827.0]], ('+683', 0.999439537525177)]
[[[1319.0, 2780.0], [1629.0, 2780.0], [1629.0, 2829.0], [1319.0, 2829.0]], ('17219.428390', 0.9997485280036926)]
[[[1904.0, 2772.0], [2178.0, 2772.0], [2178.0, 2833.0], [1904.0, 2833.0]], ('17219.4280', 0.9998974800109863)]
[[[2271.0, 2772.0], [2382.0, 2772.0], [2382.0, 2824.0], [2271.0, 2824.0]], ('+300', 0.9997367858886719)]
[[[504.0, 2785.0], [876.0, 2780.0], [877.0, 2842.0], [505.0, 2846.0]], ('38486.04369964', 0.9973931908607483)]
[[[939.0, 2789.0], [1248.0, 2789.0], [1248.0, 2837.0], [939.0, 2837.0]], ('17219.427707', 0.9997722506523132)]
[[[1898.0, 2877.0], [2181.0, 2867.0], [2183.0, 2929.0], [1901.0, 2939.0]], ('17327.3416', 0.9986153841018677)]
[[[2240.0, 2868.0], [2386.0, 2868.0], [2386.0, 2929.0], [2240.0, 2929.0]], ('+1000', 0.9997513890266418)]
[[[509.0, 2890.0], [877.0, 2890.0], [877.0, 2947.0], [509.0, 2947.0]], ('38494.02957430', 0.999668300151825)]
[[[929.0, 2886.0], [1256.0, 2876.0], [1258.0, 2938.0], [931.0, 2948.0]], ('17327.340616', 0.9951233267784119)]
[[[1319.0, 2881.0], [1634.0, 2881.0], [1634.0, 2929.0], [1319.0, 2929.0]], ('17327.341718', 0.9993224740028381)]
[[[1704.0, 2881.0], [1846.0, 2881.0], [1846.0, 2929.0], [1704.0, 2929.0]], ('+1102', 0.9996550679206848)]
[[[929.0, 2987.0], [1260.0, 2977.0], [1262.0, 3039.0], [931.0, 3048.0]], ('17341.362480', 0.9986879825592041)]
[[[1315.0, 2982.0], [1642.0, 2982.0], [1642.0, 3043.0], [1315.0, 3043.0]], ('17341.363653', 0.9986193776130676)]
[[[1704.0, 2986.0], [1842.0, 2986.0], [1842.0, 3035.0], [1704.0, 3035.0]], ('+1173', 0.9992446899414062)]
[[[1903.0, 2978.0], [2181.0, 2968.0], [2183.0, 3030.0], [1905.0, 3040.0]], ('17341.3642', 0.9991554021835327)]
[[[2245.0, 2978.0], [2382.0, 2978.0], [2382.0, 3026.0], [2245.0, 3026.0]], ('+1700', 0.9995923042297363)]
[[[509.0, 2991.0], [877.0, 2991.0], [877.0, 3048.0], [509.0, 3048.0]], ('38495.06723367', 0.9991346001625061)]
[[[1315.0, 3087.0], [1634.0, 3087.0], [1634.0, 3136.0], [1315.0, 3136.0]], ('17368.341367', 0.999509871006012)]
[[[1704.0, 3087.0], [1842.0, 3087.0], [1842.0, 3136.0], [1704.0, 3136.0]], ('+1299', 0.9989592432975769)]
[[[1908.0, 3079.0], [2178.0, 3079.0], [2178.0, 3140.0], [1908.0, 3140.0]], ('17368.3410', 0.9998946189880371)]
[[[2271.0, 3079.0], [2382.0, 3079.0], [2382.0, 3131.0], [2271.0, 3131.0]], ('006+', 0.9992048144340515)]
[[[509.0, 3096.0], [881.0, 3096.0], [881.0, 3153.0], [509.0, 3153.0]], ('38497.06365582', 0.9980858564376831)]
[[[929.0, 3092.0], [1260.0, 3082.0], [1262.0, 3144.0], [931.0, 3154.0]], ('17368.340068', 0.9978811144828796)]
[[[451.0, 3403.0], [2372.0, 3385.0], [2373.0, 3460.0], [452.0, 3478.0]], ('column) except that they will include only five digits to the right of the', 0.9924128651618958)]
[[[465.0, 3469.0], [682.0, 3469.0], [682.0, 3517.0], [465.0, 3517.0]], ('decimal.', 0.9990857243537903)]
[[[580.0, 3561.0], [2346.0, 3539.0], [2347.0, 3609.0], [581.0, 3631.0]], ('In the above table, each "residual" column represents the difference', 0.9948480129241943)]
[[[460.0, 3666.0], [2519.0, 3648.0], [2519.0, 3710.0], [461.0, 3728.0]], ('colunn. As we proceed, Mr. X and I will both plot these residuals against time,', 0.986700713634491)]
[[[460.0, 3723.0], [1014.0, 3723.0], [1014.0, 3780.0], [460.0, 3780.0]], ('as showm in Figure 2..', 0.9618627429008484)]
[[[1501.0, 3807.0], [1558.0, 3807.0], [1558.0, 3846.0], [1501.0, 3846.0]], ('12', 0.9994889497756958)]

LLMWhisperer: Best OCR to Extract Data from Handwritten Forms

Let’s see how to process challenging PDFs that contain hand-filled forms with elements like checkboxes and radiobuttons and also bad scan pages the unfriendly orientations with LLMWhisperer. LLMWhisperer is a text extraction service that specifically targets large language models (LLMs).

Comparing Cloud-Based OCR Services

Azure Document Intelligence

Azure Document Intelligence, which used to be called Azure Form Recognizer, is a cloud-based service from Microsoft Azure.

It offers advanced features for extracting text, key-value pairs, tables, and other structured data from documents.

This service is part of Azure’s AI and Cognitive Services, aimed at automating the processing and analysis of forms and documents on a large scale.

Key Features of Azure Document Intelligence:

Text Extraction and OCR: Azure Document Intelligence uses Optical Character Recognition (OCR) technology to accurately extract text from various document types like PDFs, images, and handwritten forms. It can handle both printed and cursive handwriting, making it useful for different situations.
Pre-built Models: The service provides ready-made models for specific document types, such as invoices, receipts, business cards, and ID documents. These models automatically recognize and extract relevant information, reducing the need for manual data entry.
Custom Models: Users can create custom models to recognize and extract data from documents with unique structures. This is particularly helpful for organizations working with specialized forms or documents that don’t fit standard templates.
Table Extraction: Azure Document Intelligence is excellent at extracting tables from documents, keeping the structure and relationships between cells intact. This is important for cases where tabular data needs to be imported into databases or spreadsheets.
Integration and Scalability: As a cloud service, Azure Document Intelligence can be easily integrated with other Azure services. This lets users build scalable document processing workflows and create end-to-end automation solutions with tools like Azure Logic Apps and Power Automate.
Security and Compliance: Being a Microsoft Azure service, it follows Azure’s strict security and compliance standards. This makes it suitable for handling sensitive documents and data in industries like finance, healthcare, and government.

Installation and Setup Process

First, you will need an account with Azure to retrieve your API credentials. You can sign up here and get 500 pages for free each month.

Then you can install the Python SDK with:

pip install azure-ai-documentintelligence

Now you can write the following code:

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest

# set `` and `` variables with the values from the Azure portal
endpoint = ""
key = ""

def analyze_invoice():
    # Create a Document Intelligence client
    document_intelligence_client = DocumentIntelligenceClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )

    # Read PDF to bytes
    with open("bill-of-sale.pdf", "rb") as f:
        # Analyze the document
        poller = document_intelligence_client.begin_analyze_document(
            "prebuilt-invoice", analyze_request=f, content_type="application/pdf"
        )
        invoices = poller.result()

        print(invoices.content)

if __name__ == "__main__":
    analyze_invoice()

Performance Test

Testing Azure Document Intelligence:

Document 1: High accuracy, recognized the majority of the invoice elements.

Al, Spalding & Bros.
SPALDING
PLEASE REMIT TO SPALDING SALES CORP.
SION OF SPALDING SALES CORPORATION
#1
STORE NO.
FOLIO C
FAMOUS FOR ATHLETIC EQUIPMENT
INVOICE NO.
S 2812
Sold To
6/1/39
Ship To
CUSTOMER'S
DATE
ORDER NO.
BKLYN EAGLES B B CLUB
DELD TO DIRK LUNDY
EMANLEY -
ADDRESS
101 MONTGOMERY STREET
ADDRESS
TOWN
NEWARK, N.J.
STATE
TOWN
STATE
TERMS:
2% CASH 10 DAYS-NET 30 DAYS-
VIA
CLASS
DEALER
INST.
GOLF PRO.
ORDER TAKEN BY
SALESMAN'S NAME
NO,
OF
BALE
A
GOODWIN
TAGUER
106
86
ITEM
QUANTITY
NO.
DESCRIPTION OF ARTICLE
ORDERED SHIPPED
UNIT
PRICE
AMOUNT
125
BATS
9
9
EA
1
75
15
75
120
BATS
1
1
EA
1
75
-
200
BATS
6
6
EA
1
00
6
00
1 30
BATS
2
2
EA
1
40
2
80
26
30
DI
/
50
-
$24.80
SEP 1 / 1933
-
Form F 21 1-39-M
NO RETURN OF MERCHANDISE FROM THIS INVOICE WILL BE ACCEPTED UNLESS YOU HAVE OUR WRITTEN PERMISSION.

Document 2: High accuracy, extracted the complete information from the document.

Using a 10 day interval, the values in the third column will be 10 times
the mean motion, n, and those in the fourth column will be 200 times the
acceleration coefficient. As outlined in previous ITCP Bulletins, predictions
may be made by interpolation in such a table. Its form is comparable to that
of the Daily Satellite Ephemerides.
The prediction procedure involves the calculation of a time, JNL, at
which the satellite is expected to appear at a particular position such as
the point of local culmination. For this position, we will have calculated
a value of PRM. The observation then represents a measurement of the actual
time at which the satellite appeared at, or very near this position. As a
meticulous observer, I will have measured this time to + 090000001. I will
then use this actual time to recalculate the position and will correct for
differences between the calculated and actual point of observation. I will
also make corrections for effects of the pear shape and the ellipticity of
the equator. All of this is done with the objective of obtaining a measured
value of PRM that is accurate to about + 0500001.
Another observer, Mr. X, may either be less meticulous or may lack the
means for making precise observations. He may simply use the predicted value
of PRM as a measured value to correspond with his measurement of the time.
We can assume that his accuracy is + 0900001 in timing and + 0f001 in PRM.
Let us then assume that Mr. X and I both start tracking 1960 Nu 2, using
the above table and the Gear Ratio Elements. For convenience, we will also
assume that his location is the same as mine and that he makes each observation
at the same time that I do. After about 60 days, our records of observations
might compare as shown in Table V.
TABLE V
RECORD OF OBSERVATIONS
Comparing Log of a Meticulous Observer (W. P.O)
with Log of a Casual Observer (Mr. X)
Predicted
V. P. 0.
Mr. X
JNL
PRM
PRM
Resid.
PRM
Resid.
38462.40113315
16899.946438
16899.946518
+80
16899.9469
+400
38482.05088329
17165.472894
17165.473397
+503
17165.4732
+300
38484.04727861
17192.450126
17192.450724
+598
17192.4512
+1100
38485.08494712
17206.472112
17206.472747
+635
17206.4728
+700
38486.04369964
17219.427707
17219.428390
¥683
17219.4280
+300
38494.02957430
17327.340616
17327.341718
+1102
17327.3416
+1000
38495.06723367
17341.362480
17341.363653
+1173
17341.3642
+1700
38497.06365582
17368.340068
17368.341367
+1299
17368.3410
+900
We can assume that Mr. X's values of JNL will be the same as mine (in the first
column ) except that they will include only five digits to the right of the
decimal.
In the above table, each "residual" column represents the difference
between the observed values and the predicted values, which are in the second
column. As we proceed, Mr. X and I will both plot these residuals against time,
as shown in Figure 2.
12 :selected: :selected:

Amazon Textract

Amazon Textract is a fully managed machine learning service offered by Amazon Web Services (AWS) that automatically extracts text, handwriting, and other data from scanned documents.

It goes beyond simple Optical Character Recognition (OCR) by not only identifying text but also extracting structured data like tables, forms, and fields.

This makes it particularly powerful for processing complex documents.

Key Features of Amazon Textract:

Text and Handwriting Extraction: Amazon Textract can extract both printed text and handwriting from scanned documents, including forms, invoices, receipts, and more. It supports a wide range of document formats, such as PDFs and images (PNG, JPEG, TIFF).
Form and Table Extraction: One of Textract’s standout features is its ability to detect and extract structured data from forms and tables. It not only extracts the text but also preserves the relationship between fields and data. This is crucial for accurately capturing information from complex documents.
Scalability and Integration: As a cloud-native service, Amazon Textract scales automatically to handle large volumes of documents. It can be easily integrated with other AWS services like Amazon S3 for storage, Amazon Lambda for serverless computing, and Amazon Comprehend for further text analysis. This makes it a robust component for building comprehensive document processing workflows.
Security and Compliance: Amazon Textract adheres to AWS’s rigorous security standards, including encryption of data in transit and at rest. This makes it suitable for handling sensitive and regulated information. It also complies with various industry standards such as HIPAA for healthcare data and GDPR for data protection.
Automatic Language Detection: Textract supports automatic detection of multiple languages within documents, enabling it to process documents in different languages without requiring prior specification.
Machine Learning-Based Accuracy: Unlike traditional OCR, which might struggle with complex layouts or variations in text size and alignment, Textract leverages machine learning to improve accuracy in text extraction and interpretation, particularly in documents with irregular layouts.

Installation and Setup Process

First, you will need to install the Python library to interface with AWS APIs:

pip install boto3

Then you can write the following code:

import boto3
from textractprettyprinter.t_pretty_print_expense import get_string, Textract_Expense_Pretty_Print, Pretty_Print_Table_Format

# AWS credentials
AWS_ACCESS_KEY_ID = "AKIAXBTCFDEFMHROHVGD"
AWS_SECRET_ACCESS_KEY = "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"

# Initialize Textract client
textract = boto3.Session(aws_access_key_id=AWS_ACCESS_KEY_ID,
                         aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                         region_name='eu-central-1').client('textract')

# Call Textract to extract text from the document
response = textract.analyze_document(
    Document={
        'S3Object': {
            'Bucket': "nunobispo-text",
            'Name': "bill-of-sale.pdf"
        }
    },
    FeatureTypes=["TABLES", "FORMS"]
)

# Print the extracted text
for block in response['Blocks']:
    if block['BlockType'] == 'LINE':
        print('Detected text: ' + block['Text'])
    elif block['BlockType'] == 'KEY_VALUE_SET' and 'KEY' in block['EntityTypes']:
        print('Key: ' + block['Text'])
    elif block['BlockType'] == 'TABLE':
        print('Table detected:')
        for relationship in block['Relationships']:
            for id in relationship['Ids']:
                for cell in response['Blocks']:
                    if cell['Id'] == id:
                        print('Cell text: ' + cell['Text'])

Performance Test

Testing Amazon Textract:

Document 1: Extremely accurate, especially with printed text.

Detected text: Ahspalding & Bros.
Detected text: mone
Detected text: PALDIN
Detected text: PLEASE REMIT TO SPALDING SALES CORP.
Detected text: SION OF SPALDING SALES CORPORATION
Detected text: MARK
Detected text: #1
Detected text: STORE NO.
Detected text: FOLIO C
Detected text: FAMOUS FOR ATHLETIC EQUIPMENT
Detected text: INVOICE NO. S 2812
Detected text: Sold To
Detected text: Ship To
Detected text: CUSTOMER'S
Detected text: DATE
Detected text: 6/1/39
Detected text: ORDER NO.
Detected text: BKLYN EAGLES B B CLUB
Detected text: DELD TO DIRK LUNDY
Detected text: EMANLEY-
Detected text: ADDRESS
Detected text: ADDRESS
Detected text: 101 MONTGOMERY STREET
Detected text: TOWN
Detected text: NEWARK, N.J. STATE
Detected text: TOWN
Detected text: STATE
Detected text: TERMS:
Detected text: 2% CASH 10 DAYS-NET 30 DAYS-
Detected text: VIA
Detected text: DEALER
Detected text: INST.
Detected text: GOLF PRO.
Detected text: ORDER TAKEN BY
Detected text: SALESMAN'S NAME
Detected text: CLASS
Detected text: NO,
Detected text: OF
Detected text: SALE
Detected text: A
Detected text: GOODWIN
Detected text: TAGUER
Detected text: 106
Detected text: 86
Detected text: ITEM
Detected text: QUANTITY
Detected text: DESCRIPTION OF ARTICLE
Detected text: UNIT
Detected text: y
Detected text: PRICE
Detected text: AMOUNT
Detected text: NO.
Detected text: ORDERED
Detected text: SHIPPED
Detected text: -
Detected text: 125
Detected text: BATS
Detected text: 9
Detected text: 9
Detected text: EA
Detected text: 1
Detected text: 75
Detected text: 15
Detected text: 75
Detected text: 120
Detected text: BATS
Detected text: 1
Detected text: 1
Detected text: EA
Detected text: 1
Detected text: 75
Detected text: -
Detected text: -
Detected text: 200
Detected text: BATS
Detected text: 6
Detected text: 6
Detected text: EA
Detected text: 1
Detected text: 00
Detected text: 6
Detected text: 00
Detected text: -
Detected text: -
Detected text: 130
Detected text: BATS
Detected text: 2
Detected text: 2
Detected text: EA
Detected text: 1
Detected text: 40
Detected text: 2
Detected text: 80
Detected text: -
Detected text: 26
Detected text: Cr
Detected text: 30
Detected text: -
Detected text: /
Detected text: 50
Detected text: -
Detected text: BAB
Detected text: memo
Detected text: 80
Detected text: -
Detected text: 24
Detected text: -
Detected text: -
Detected text: SEP /
Detected text: -
Detected text: -
Detected text: -
Detected text: it
Detected text: -
Detected text: Form F 21 1-39-M
Detected text: NO RETURN OF MERCHANDISE FROM THIS INVOICE WILL BE ACCEPTED UNLESS YOU HAVE OUR WRITTEN PERMISSION.
Table detected:

Document 2: Excellent table and form extraction.

Detected text: Using a 10 day interval, the values in the third column will be 10 times
Detected text: the mean motion, n, and those in the fourth column will be 200 times the
Detected text: acceleration coefficient. As outlined in previous ITCP Bulletins, predictions
Detected text: may be made by interpolation in such a table. Its form is comparable to that
Detected text: of the Daily Satellite Ephemerides.
Detected text: The prediction procedure involves the calculation of a time, JNL, at
Detected text: which the satellite is expected to appear at a particular position such as
Detected text: the point of local culmination. For this position, we will have calculated
Detected text: a value of PRM. The observation then represents a measurement of the actual
Detected text: time at which the satellite appeared at, or very near this position. As a
Detected text: meticulous observer, I will have measured this time to + 0.0000001. I will
Detected text: then use this actual time to recalculate the position and will correct for
Detected text: differences between the calculated and actual point of observation. I will
Detected text: also make corrections for effects of the pear shape and the ellipticity of
Detected text: the equator. All of this is done with the objective of obtaining a measured
Detected text: value of PRM that is accurate to about 0.00001.
Detected text: Another observer, Mr. X, may either be less meticulous or may lack the
Detected text: means for making precise observations. He may simply use the predicted value
Detected text: of PRM as a measured value to correspond with his measurement of the time.
Detected text: We can assume that his accuracy is 000001 in timing and + 0.001 in PRM.
Detected text: Let us then assume that Mr. X and I both start tracking 1960 Nu 2, using
Detected text: the above table and the Gear Ratio Elements. For convenience, we will also
Detected text: assume that his location is the same as mine and that he makes each observation
Detected text: at the same time that I do. After about 60 days, our records of observations
Detected text: might compare as shown in Table V.
Detected text: TABLE V
Detected text: RECORD OF OBSERVATIONS
Detected text: Comparing Log of a Meticulous Observer (W.P.O)
Detected text: with Log of a Casual Observer (Mr. X)
Detected text: Predicted
Detected text: V. P. O.
Detected text: Mr. X
Detected text: JNL
Detected text: PRM
Detected text: PRM
Detected text: Resid.
Detected text: PRM
Detected text: Resid.
Detected text: 38462.40113315
Detected text: 16899.946438
Detected text: 16899.946518
Detected text: +80
Detected text: 16899.9469
Detected text: +400
Detected text: 38482.05088329
Detected text: 17165.472894
Detected text: 17165.473397
Detected text: +503
Detected text: 17165.4732
Detected text: +300
Detected text: 38484.04727861
Detected text: 17192.450126
Detected text: 17192.450724
Detected text: +598
Detected text: 17192.4512
Detected text: +1100
Detected text: 38485.08494712
Detected text: 17206.472112
Detected text: 17206.472747
Detected text: +635
Detected text: 17206.4728
Detected text: +700
Detected text: 38486.04369964
Detected text: 17219.427707
Detected text: 17219.428390
Detected text: +683
Detected text: 17219.4280
Detected text: +300
Detected text: 38494.02957430
Detected text: 17327.340616
Detected text: 17327.341718
Detected text: +1102
Detected text: 17327.3416
Detected text: +1000
Detected text: 38495.06723367
Detected text: 17341.362480
Detected text: 17341.363653
Detected text: +1173
Detected text: 17341.3642
Detected text: +1700
Detected text: 38497.06365582
Detected text: 17368.340068
Detected text: 17368.341367
Detected text: +1299
Detected text: 17368.3410
Detected text: +900
Detected text: We can assume that Mr. X's values of JNL will be the same as mine (in the first
Detected text: column) except that they will include only five digits to the right of the
Detected text: decimal.
Detected text: In the above table, each "residual" column represents the difference
Detected text: between the observed values and the predicted values, which are in the second
Detected text: column. As we proceed, Mr. X and I will both plot these residuals against time,
Detected text: as shown in Figure 2.
Detected text: 12

Introducing LLMWhisperer: The Best OCR for LLMs

LLMWhisperer is a tool that helps make complex documents easier for large language models (LLMs) to understand. It’s great at pulling text out of different types of documents, like scanned forms, PDFs, and images with unusual layouts.

It keeps the structure of the document so the LLM can process it accurately. It can switch between text extraction and OCR ( Optical Character Recognition) modes automatically, depending on the type of document, and it can handle even poorly scanned or misaligned documents.

Key Features

Key features of LLMWhisperer include:

Automatic Mode Switching: It can easily switch between extracting text and using OCR based on the type of document, making sure it gets the best results from both digital text and scanned images.
Layout Preservation: LLMWhisperer keeps the original layout of documents when it extracts text. This is important for keeping the context and accuracy when the data is used by large language models.
Checkbox and Radio Button Recognition: It accurately identifies and converts checkboxes and radio buttons from forms into a text format that language models can easily understand, making it better for processing form-based data.
Document Preprocessing: The tool has advanced options for preprocessing documents, like applying filters and adjusting image settings. This helps improve the quality of text extraction, especially from poorly scanned documents.
Structured Data Output: LLMWhisperer can produce structured data outputs, like JSON, making it easier to use the extracted information in other systems and workflows.
SaaS and On-Premise Deployment: It offers flexible deployment options, including a fully managed online service and an on-premise version for handling sensitive data securely.

PDF hell and practical RAG applications

Extracting text from PDFs for use in applications like Retrieval-Augmented Generation (RAG) and Natural Language Processing (NLP) is challenging due to their fixed layout structure and lack of semantic organization.

Learn how faithfully reproducing the structure of tables and other elements by preserving the original PDF layout is crucial for better performance in LLM and RAG applications.

Learn more →

LLMWhisperer: Best OCR for PDF Checkbox Extraction

PDF forms have checkboxes and radiobuttons that can be filled out by the user. These form elements are used to collect data from the user. In this video, we will show how to extract these form elements using LLMWhisperer in a way that LLMs can understand.

Installation and Setup Process

To use LLMWhisperer, you need to install the llmwhisperer-client package and obtain an API key from the LLMWhisperer website.

Follow these steps to get started:

Sign Up for an API Key: Visit the LLMWhisperer website and sign up for an account. You will receive an API key that you need to use in your code.
Install the LLMWhisperer Client: Use pip to install the llmwhisperer-client package.

pip install llmwhisperer-client

Then you can write the following code:

from unstract.llmwhisperer.client import LLMWhispererClient

# Initialize the client with your API key
client = LLMWhispererClient(base_url="", api_key='')

# Extract tables from the PDF
result = client.whisper(file_path="bill-of-sale.pdf")
extracted_text = result["extracted_text"]
print(extracted_text)

Performance Test

Testing LLMWhisperer on the two documents:

Document 1: Exceptional accuracy, especially with complex layouts and handwritten text.

Al, Spalding              Bros.        SPALDING           PLEASE REMIT TO SPALDING SALES CORP.
       SION OF SPALDING SALES CORPORATION     MARK

                                                                        #1
                                                            STORE NO.           FOLIO C

  FAMOUS FOR ATHLETIC EQUIPMENT
                                                                          INVOICE NO. S 2812 

                                                                          CUSTOMER'S
Sold    To           DATE      6/1/39                Ship To              ORDER NO.

           BKLYN EAGLES B B CLUB                                 DELD TO DIRK LUNDY
           EMANLEY -
ADDRESS                                            ADDRESS
           101 MONTGOMERY STREET
TOWN       NEWARK, N.J.      STATE                  TOWN                        STATE
 TERMS:
2% CASH TO DAYS-NET 30 DAYS-                       VIA

        DEALER INST. GOLF PRO.      ORDER TAKEN BY             SALESMAN'S NAME              NO,
 CLASS
   OF
  BALE              A             GOODWIN                TAGUER                             106 

    ITEM                                                 QUANTITY
86 NO.                  DESCRIPTION OF ARTICLE        ORDERED SHIPPED UNIT      PRICE      AMOUNT

125           BATS                                       9       9     EA         1 75      15 75 
-
120           BATS                                        1      1      EA                    1 75 
-
200           BATS                                       6       6      EA        1 00       6 00 

                                                                                 1 40 
1 30          BATS                                        2      2      EA                    2 80 
                                                                                           26 30 
                                                                                            150 
-
                                                                                               80-

-

                                            SEP / / 1933 

  Form F 21 1-39-M
          NO RETURN OF MERCHANDISE FROM THIS INVOICE WILL BE ACCEPTED UNLESS YOU HAVE OUR WRITTEN PERMISSION.
<<<

Document 2: Superior structured data extraction compared to other tools.

Using a 10 day interval, the values in the third column will be 10 times
the mean motion, n, and those in the fourth column will be 200 times the
acceleration coefficient. As outlined in previous ITCP Bulletins, predictions
may be made by interpolation in such a table. Its form is comparable to that
of the Daily Satellite Ephemerides.

    The prediction procedure involves the calculation of a time, JNL, at
which the satellite is expected to appear at a particular position such as 
the point of local culmination. For this position, we will have calculated
a value of PRM. The observation then represents a measurement of the actual
time at which the satellite appeared at, or very near this position. As a
meticulous observer, I will have measured this time to + 090000001. I will
then use this actual time to recalculate the position and will correct for 
differences between the calculated and actual point of observation. I will
also make corrections for effects of the pear shape and the ellipticity of
the equator. All of this is done with the objective of obtaining a measured
value of PRM that is accurate to about + 0500001. 

    Another observer, Mr. X, may either be less meticulous or may lack the
means for making precise observations. He may simply use the predicted value
of PRM as a measured value to correspond with his measurement of the time.
We can assume that his accuracy is + 0900001 in timing and + 0f001 in PRM.
Let us then assume that Mr. X and I both start tracking 1960 Nu 2, using
the above table and the Gear Ratio Elements. For convenience, we will also
assume that his location is the same as mine and that he makes each observation
at the same time that I do. After about 60 days, our records of observations
might compare as shown in Table V.

                                TABLE V

                        RECORD OF OBSERVATIONS

                Comparing Log of a Meticulous Observer (W.P.O)
                    with Log of a Casual Observer (Mr. [X] X)

                 Predicted          W. P. O.           Mr. X
      JNL           PRM          PRM       Resid.    PRM      Resid.

 38462.40113315 16899.946438 16899.946518    +80 16899.9469    +400 

 38482.05088329 17165.472894 17165.473397   +503 17165.4732    +300 

 38484.04727861 17192.450126 17192.450724   +598 17192.4512   +1100 

 38485.08494712 17206.472112 17206.472747   +635 17206.4728    +700 

 38486.04369964 17219.427707 17219.428390   +683 17219.4280    +300 

 38494.02957430 17327.340616 17327.341718 +1102   17327.3416 +1000 

 38495.06723367 17341.362480 17341.363653 +1173   17341.3642 +1700 

 38497.06365582 17368.340068 17368.341367 +1299   17368.3410   +900 

We can assume that Mr. X's values of JNL will be the same as mine (in the first
column ) except that they will include only five digits to the right of the
decimal.

    In the above table, each "residual" column represents the difference
between the observed values and the predicted values, which are in the second
column. As we proceed, Mr. X and I will both plot these residuals against time,
as shown in Figure 2.
                                    12
<<<

Best OCR Software: Comparison of Main Features, Capabilities, and Precision

In this section, we will compare the four OCR tools—Tesseract, PaddleOCR, Azure Document Intelligence, and Amazon Textract—along with LLMWhisperer. We’ll focus on their main features, abilities, accuracy, and overall performance.

Feature	Tesseract	PaddleOCR	Azure Document Intelligence	Amazon Textract	LLMWhisperer
Accuracy	High	Very High	Very High	Extremely High	Superior
Language Support	100+	80+	Multi-language	Multi-language	Multi-language
Complex Layouts Handling	Moderate	High	Very High	Very High	Superior
Structured Data Extraction	Low	Moderate	Very High	Extremely High	Superior
Deployment Flexibility	High (Local)	High (Local)	High (Cloud)	High (Cloud)	High (Cloud)
Ease of Use	Moderate	Easy	Easy	Moderate	Easy
Cost	Free	Free	Paid	Paid	Paid
Custom Training	Yes	Yes	Yes	No	Yes
Integration	Moderate	High	High	High	High
Security and Compliance	N/A	N/A	High	High	High

Evaluating the Best OCR Software: Analysis of Performance Tests

When evaluating these tools against the sample documents, several factors come into play, including their ability to accurately extract text, handle complex layouts, and maintain the structure of the document.

Tesseract OCR:

Strengths:

Tesseract performs well with clean, high-quality images and provides solid accuracy for straightforward text recognition tasks.
It supports a wide range of languages and allows for custom training to improve recognition.

Weaknesses:

It struggles with complex layouts and handwritten text, which can result in lower accuracy and missed data in such cases.

PaddleOCR:

Strengths:

PaddleOCR excels in recognizing text in multi-language documents and handles complex layouts better than Tesseract.
It is also lightweight and fast, making it suitable for real-time applications.

Weaknesses:

While it performs well, its structured data extraction capabilities are less advanced compared to cloud-based services like Azure Document Intelligence and Amazon Textract.

Azure Document Intelligence:

Strengths:

Azure Document Intelligence offers excellent accuracy and handles structured data extraction extremely well, especially in documents with tables and forms.
Its pre-built models simplify the extraction process for common document types.

Weaknesses:

As a cloud-based service, it requires a paid subscription, and its performance is dependent on internet connectivity.

Amazon Textract:

Strengths:

Textract is highly accurate, particularly with complex documents and forms.
It excels in structured data extraction and is well-integrated with other AWS services, making it a powerful tool for large-scale document processing.

Weaknesses:

Similar to Azure, it is a paid service and requires integration into the AWS ecosystem, which might be a barrier for some users.

LLMWhisperer:

Strengths:

LLMWhisperer stands out with its ability to handle complex layouts, extract structured data with superior accuracy, and maintain the original document structure.
It is particularly effective for documents that combine various types of content, such as text, images, and forms.

Weaknesses:

Being a new entrant, it might have limited community support compared to more established tools.
It is also a paid service.

Choosing the Right OCR Tool: Aligning with Your Project Needs

In summary, the best OCR tool depends on your specific needs:

Tesseract is ideal for those who need a free, highly customizable tool for simple OCR tasks.
PaddleOCR is a great choice for users who require a lightweight, fast, and highly accurate OCR solution with support for multiple languages.
Azure Document Intelligence and Amazon Textract are top choices for businesses that need advanced structured data extraction and can afford a cloud-based, paid solution.
LLMWhisperer is recommended for those who need superior accuracy and the ability to handle complex documents with various content types.

Ultimately, the choice of OCR tool should align with the specific requirements of your projects, whether it’s cost-effectiveness, accuracy, language support, or the ability to process complex document layouts.

For the curious. Who are we, and why are we writing about OCR?

We are building Unstract. Unstract is a no-code platform to eliminate manual processes involving unstructured data using the power of LLMs. The entire process discussed above can be set up without writing a single line of code. And that’s only the beginning. The extraction you set up can be deployed in one click as an API or ETL pipeline.

With API deployments, you can expose an API to which you send a PDF or an image and get back structured data in JSON format. Or with an ETL deployment, you can just put files into a Google Drive, Amazon S3 bucket or choose from a variety of sources and the platform will run extractions and store the extracted data into a database or a warehouse like Snowflake automatically. Unstract is an open-source software and is available at https://github.com/Zipstack/unstract.

LLMWhisperer is a document-to-text converter(best ocr). Prep data from complex documents for use in Large Language Models. LLMs are powerful, but their output is as good as the input you provide. Documents can be a mess: widely varying formats and encodings, scans of images, numbered sections, and complex tables.

Extracting data from these documents and blindly feeding it to LLMs is not a good recipe for reliable results. LLMWhisperer is a technology that presents data from complex documents to LLMs in a way they can best understand.

If you want to take it for a test drive quickly, you can check out our free playground.

Try Playground with your docs

Get started with LLMWhisperer: Best OCR for AI Document Processing

A Comparison of Best OCR software in 2026: Related topics for exploration

Best OCR for parsing receipts

Best PDF OCR for extracting data from invoice

Best OCR for reading bookkeeping documents

Best OCR for accounts payable documents

Best PDF OCR Tools in 2026: FAQs

What criteria should I use to identify the best OCR software for my organization?
Focus on ten factors highlighted in the article: accuracy, multi-language support, complex-layout handling, structured data extraction, deployment flexibility, ease of use, cost, custom training options, integration capability, and security/compliance. Weigh these against your document types (invoices, forms, scientific papers, etc.) and your budget.

What is the best OCR software for handling complex layouts and handwritten forms in 2026?
Based on the performance tests, LLMWhisperer is highlighted as providing superior accuracy for complex documents, including those with handwritten text and challenging layouts like the sample bill of sale, outperforming other tools in this specific category.

Which best OCR software offers both on-premise and cloud-based deployment options in 2026?
Several top contenders for the best OCR software provide both deployment options. LLMWhisperer, for example, offers flexible deployment as a Software-as-a-Service (SaaS) or on-premise, making it suitable for organizations with strict security requirements.

What deployment models do the best OCR software options support in 2026?
You have three broad choices:

Local/offline (Tesseract, PaddleOCR) – full control, no recurring fees, but more maintenance.
Cloud APIs (Azure Document Intelligence, Amazon Textract) – scalable and managed, paid per page.
Hybrid/SaaS or on-prem (LLMWhisperer) – hosted service for ease of use plus an optional on-prem version when data residency or privacy demands it.

How does LLMWhisperer position itself as the best OCR software for large-language-model (AI) workflows?
The best OCR software, like LLMWhisperer, can detect and extract form elements such as checkboxes and radio buttons, converting them into formats that are easily interpreted by AI or LLM workflows. This is especially useful for automating data entry from scanned forms and PDF documents.

The post Best OCR Software in 2026 — A Tool Comparison & Evaluation Guide first appeared on Unstract.com.

Guide to LLMWhisperer Document OCR API

Tarun Singh — Thu, 19 Dec 2024 14:12:10 +0000

Introduction: What is a Document OCR API?

A Document OCR API (document scanner API) is a tool that converts text from scanned documents, PDFs, or images into machine-readable formats like JSON or text. It bridges the gap between static documents and dynamic workflows by enabling seamless text extraction across diverse document types.

Purpose and Advantages

Scalability: Handles varying workloads, from a few documents to thousands, easily.
Ease of Integration: Integrates directly into existing systems like CRMs and ERPs.
Flexibility: Processes various document types, including scanned forms, handwritten notes, and tables, ensuring compatibility with multiple layouts.

By offering OCR functionality as an API, organizations eliminate the need for standalone tools and automate text extraction, making document data instantly usable.

Why Extracting Data from Documents is a Critical Business Process

Extracting data from documents like scanned PDFs and images is crucial for businesses managing unstructured data. Many operations depend on extracting details like invoice totals, tax IDs, or customer information, which is often locked in static formats.

Challenges with Manual Extraction

Manual methods are time-intensive, error-prone, and lack scalability. Employees may spend hours transcribing scanned forms, introducing inaccuracies, and delaying workflows.
Example: A logistics company may struggle with manually extracting data from thousands of delivery receipts, leading to delays.

TL;DR

If you wish to skip directly to the solution section, where you can see how LLMWhisperer OCR API handles document types of any complexity — document scans, images, PDFs with complex tables, checkboxes, and handwriting etc, click here.

Benefits of Automation

Time Savings: Automating OCR drastically reduces processing time.
Example: Retailers process invoices in minutes, ensuring prompt supplier payments.
Accuracy: APIs reduce human error by standardizing data extraction.
Example: An insurance company extracts claim details with near-perfect accuracy.
Seamless Integration: Extracted data flows into systems for real-time analysis, boosting operational efficiency.

Automation makes static data actionable, saving time, reducing costs, and enhancing decision-making.

Challenges in Document PDF Processing and How OCR Solves Them

Processing scanned PDFs and images presents unique challenges, including variability in document formats, low-quality scans, and complex layouts. OCR APIs address these hurdles with precision.

Key Challenges

Variability in Layouts: Scanned documents often lack uniformity, with each having distinct fonts, tables, or annotations.
Example: Vendor invoices with different formats require dynamic adaptation.
Low-Quality Scans: Blurry or faded text and noisy images hinder recognition.
Example: A smudged tax form still yields accurate results after pre-processing by OCR.
Complex Tables: Extracting data from multi-page tables or overlapping elements is challenging.
Example: A bank statement’s mixed columns for debits and credits is parsed correctly with OCR.

How OCR Solves These Issues

Pattern Recognition: Identifies and categorizes text elements, ensuring accurate extraction.
Error Correction: Post-processing algorithms refine OCR outputs for precision.
Layout Preservation: Maintains document structure for downstream analysis.

OCR APIs overcome these obstacles, enabling organizations to extract accurate, structured data efficiently from even the most challenging document types.

Brief Introduction to LLMWhisperer and Its Document OCR API Capabilities

Overview of LLMWhisperer

LLMWhisperer is an advanced text parser that prepares complex documents, including scanned PDFs, images, and tables, for downstream processing by large language models (LLMs). It focuses on converting unstructured or semi-structured data into organized, actionable formats like JSON. LLMWhisperer shines in handling intricate layouts, noisy scans, and handwritten text, enabling organizations to streamline document parsing workflows.

LLMWhisperer is not powered by AI or LLMs for the OCR process. Instead, it enhances OCR outputs by applying intelligent parsing mechanisms that make the data more comprehensible for LLMs and other systems.

LLMWhisperer: The best document OCR API for AI workflows

Modern Document OCR API: Key Features and Capabilities

Structured Data Extraction

LLMWhisperer excels at extracting structured text from scanned or image-based documents. It identifies key elements such as text zones, tables, and annotations, organizing them into usable formats.
Complex Layout Handling

It preserves the structure of intricate document layouts, such as multi-column designs, nested tables, and checkboxes. This ensures accurate parsing while maintaining the context of the original document.
Example: Processing a multi-page bank statement with overlapping tables and transaction details without losing alignment or context.
Pre-Processing for Noisy or Low-Quality Scans

LLMWhisperer improves the quality of noisy scans by applying filters, de-skewing, and contrast adjustments. This pre-processing enhances the accuracy of OCR outputs.
Example: Extracting policyholder details from a faded, watermarked insurance document with precision.
Multilingual Document Parsing

It supports multiple languages, allowing global enterprises to parse documents containing diverse scripts and mixed-language content.
Example: Parsing an international invoice containing English, German, and French sections.
OCR reader API that talks to your systems

Available as an API, LLMWhisperer integrates effortlessly into existing workflows, enabling organizations to preprocess documents on the fly. This flexibility ensures scalability and adaptability across various document types.

LLMWhisperer’s Role in OCR Workflows

Not Just an OCR Tool

LLMWhisperer is not a standalone OCR engine; it acts as a bridge between raw OCR outputs and advanced LLM-powered systems. By parsing and structuring the data, it ensures the input is optimized for downstream processing by LLMs, enhancing contextual understanding and overall accuracy.

Example: Preparing OCR-extracted text from a scanned academic thesis for further semantic analysis by an LLM.

Comparison with Traditional OCR

Traditional OCR engines, such as Tesseract, rely on static models and predefined patterns, which struggle with non-standard or complex layouts. In contrast, LLMWhisperer uses advanced algorithms to adapt dynamically, preserving document context while ensuring accuracy in text parsing.

Real-World Applications

Form-Based Data Parsing
LLMWhisperer identifies checkboxes, radio buttons, and handwritten notes in scanned forms, converting them into structured data formats.
Example: Digitizing a tax form by extracting selected filing statuses, amounts, and notes.
Historical Document Processing
It processes low-quality historical documents like manuals or old reports, cleaning up noisy scans and extracting text accurately.
Example: Extracting structured text from a scanned page of the Apollo Space Mission Manual.
Global Invoice Management
LLMWhisperer handles multi-language invoices with varying layouts, extracting consistent data fields for payment processing.
Example: Parsing invoices from suppliers across different countries and consolidating the data for analytics.

Why choose LLMWhisperer for the document reader API?

Adaptability Across Document Types
LLMWhisperer handles printed, handwritten, and mixed-layout documents, making it ideal for businesses that process diverse data sources.
Ease of Integration
Its API-based architecture allows seamless integration into existing systems, ensuring smooth data processing pipelines without overhauling infrastructure.
Foundational Parsing for LLMs
By enhancing and structuring OCR outputs, LLMWhisperer maximizes the potential of LLMs, ensuring accurate and context-aware insights from complex document data.

With its robust capabilities, LLMWhisperer transforms how businesses process scanned documents and prepare data for AI-driven applications, delivering efficiency, accuracy, and scalability.

Pre-Processing with LLMWhisperer: How It Helps Scanned PDF/Image OCR

Pre-processing is a critical step in ensuring accurate and efficient Optical Character Recognition (OCR) for scanned PDFs and image-based documents. LLMWhisperer stands out as a powerful text parser by employing advanced pre-processing techniques to address common challenges such as noisy scans, complex layouts, and unstructured formats. These enhancements not only improve OCR accuracy but also prepare the extracted data for seamless downstream analysis.

Key Pre-Processing Techniques in LLMWhisperer

Noise Removal
Noise, such as smudges, watermarks, stains, or digital artifacts, can significantly impact OCR performance. LLMWhisperer’s noise removal techniques enhance document clarity, ensuring that irrelevant marks do not interfere with text recognition.

How It Works:
- Filters out unwanted artifacts, such as background patterns or faint text shadows.
- Enhances text contrast for better recognition of faint or degraded text.

Layout Preservation
Many scanned PDFs and image-based documents have complex layouts, including multi-column text, tables, or overlapping sections. LLMWhisperer preserves the original structure of these documents during pre-processing, enabling accurate text parsing without distorting the context or relationships between data points.

How It Works:
- Recognizes document zones (headers, footers, tables, and text blocks) and maintains their spatial arrangement.
- Prepares tabular data and column-based layouts for structured extraction.

Format Optimization
Scanned documents often suffer from issues like skewed alignment, uneven text spacing, or non-standard formats. LLMWhisperer applies format optimization techniques to align and structure the content for maximum OCR accuracy.

How It Works:
- De-skewing: Corrects tilted or misaligned pages, ensuring text is horizontal for proper segmentation.
- Standardization: Adjusts inconsistent spacing, line breaks, and margins to create a uniform layout.
- Pre-OCR Splitting: Breaks large blocks of text into manageable segments for more precise recognition.

Live coding session on data extraction from a scanned PDF form with LLMWhisperer

You can also watch this live coding webinar where we explore all the challenges involved in scanned PDF parsing. We’ll also compare the capabilities of different PDF parsing tools to help you understand their strengths and limitations.

Benefits of LLMWhisperer’s Pre-Processing Techniques

Enhanced OCR Accuracy
Pre-processing directly improves the quality of the input data, enabling OCR engines to identify characters, words, and layouts more precisely.

Impact: Reduces errors caused by poor scan quality, such as misreading “O” as “0” or skipping faded text.

Preservation of Document Context
Maintaining the document’s original layout ensures that the relationships between data points remain intact, which is crucial for downstream analysis.

Impact: Data extracted from tables, forms, and columns is contextually accurate and ready for further processing.

Support for Diverse Document Types
LLMWhisperer’s pre-processing techniques work seamlessly across a variety of documents, including handwritten forms, scanned PDFs, and image-based text.

Impact: Expands the range of documents that businesses can process automatically, minimizing the need for manual intervention.

Improved Downstream Analysis
Pre-processed and structured data ensures smooth integration with analytics platforms, databases, or AI models, enabling real-time insights and decision-making.

Impact: Eliminates the need for manual corrections or additional formatting steps.

Real-World Demonstration: Scanned Document Processing

Let’s take the example of processing a scanned document, such as a page from the Apollo Space Mission Manual. Using LLMWhisperer, this process becomes seamless and efficient:

Noise Removal
LLMWhisperer eliminates age-related degradation like faded text, stains, or handwritten annotations in the margins, ensuring the document is clean and readable for OCR extraction.
Layout Preservation
The structured format of the manual, including diagrams, tables, and text, is preserved. This ensures that critical information such as captions, headers, and content flows remain in their original context.

Format Optimization
Any misaligned text or uneven spacing caused by outdated scanning methods is corrected. This results in a clear, professionally aligned document ready for accurate data parsing.

Why LLMWhisperer’s Pre-Processing Is Essential

LLMWhisperer’s robust pre-processing ensures that OCR systems perform optimally, even under challenging conditions like noisy, low-quality scans or complex layouts. By preserving the integrity of the document while enhancing its quality, LLMWhisperer enables businesses to unlock valuable insights from scanned PDFs and image-based text with unparalleled accuracy and efficiency.

Two Practical Examples Using LLMWhisperer

LLMWhisperer is a powerful tool for extracting text and structured data from scanned documents and images. Below, we showcase two practical examples to demonstrate its capabilities:

Example 1: Using LLMWhisperer Playground to Extract Text from the Apollo Space Mission Manual

The Apollo Space Mission Manual is a historical document with a mix of structured tables, illustrations, and descriptive text. By using the LLMWhisperer Playground, we can see how effectively it processes such a complex document.

Step-by-Step Process

Visit the Playground
Go to the LLMWhisperer Playground.
Upload the Document
Drag and drop a scanned Apollo Space Mission Manual page or upload the file manually.
Initiate Processing
Once uploaded, click the Process Document button. LLMWhisperer begins analyzing the document.
Observe the Magic
In just seconds, the extracted text and data appear on your screen.
- Preserved Layout: LLMWhisperer retains the document’s original structure, keeping sections like headers, tables, and diagrams intact.
- Clean Text Extraction: Even faded or poorly scanned text is accurately recognized and digitized.
- Tables and Diagrams: Complex tables are extracted precisely, with rows and columns preserved in a structured format, ready for downstream processing.

Benefits Demonstrated

Effortlessly handles scanned documents with mixed content.
Ensures no loss of context or structure, making the document ready for archival or analysis.

Whether digitizing scanned data or processing technical manuals, LLMWhisperer Playground shows how easily it bridges the gap between scanned documents and machine-readable data.

Get started with LLMWhisperer: Best Document OCR API for AI Document Worklfows

Example 2: Using LLMWhisperer Document OCR API in Postman to Extract Text from a Handwritten Scanned Image (PNG)

For this example, let’s demonstrate the capabilities of the LLMWhisperer API to process a handwritten scanned image (e.g., a PNG file of notes or forms).

Step-by-Step Workflow Using Postman

Sign Up and Get Document OCR API Key
- Visit the LLMWhisperer signup page and create an account.
- Retrieve your API key from the account dashboard.
POST Request to Process the Document
- Open Postman and create a new POST request.
- Use the /whisper endpoint: https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper

Headers:

Key: unstract-key
Value:

Body:

Select form-data.
Add the key files, set its type to File, and upload the handwritten scanned image (PNG).

LLMWhisperer Document OCR API Request

Example cURL Command:

curl -X POST 'https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper' \

-H 'unstract-key: ' \

-F 'files=@handwritten_sample.png'

Submit the request, and you’ll receive a response containing the whisper_hash, which you’ll use to retrieve the results.

Check Conversion Status
- Create a GET request using the /whisper-status endpoint to check if processing is complete:
  https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper-status

Example cURL Command:

curl -X GET 'https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper-status' \

-H 'unstract-key: '

The response will indicate the status (processing or completed).

Retrieve the Extracted Text
- Once processing is complete, use the /whisper-retrieve endpoint to fetch the extracted text: https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper-retrieve

GET Request Parameters:

Add whisper_hash as a query parameter.
Use text_only=true for a clean, nearly formatted text output.

LLMWhisperer Document OCR API Response

Example cURL Command:

curl -X GET 'https://llmwhisperer-api.us-central.unstract.com/api/v2/whisper-retrieve?whisper_hash=&text_only=true' \

-H 'unstract-key: '

The response will include an object containing the structured text extracted from the handwritten scanned image.

The outcome of the OCR extraction:

                               EWD1200-0 

Only a matter of style? 

  For educational purposes we analyse the 
opening pages of an 11-page article that 
appeared in The American Mathematical 
Monthly, Volume 102 Number 2 / February 1995. 
We have added line numbers in the right 
margin. 
line 4: Since in this article, squares don't get 
alternating colours , it could be argued that 
the term " chessboard " is misplaced . 

line 4 : The introduction of the name " B " 
seems unnecessary : it is used - in the 
combination " the board B " - in the text 
for Figure 1 and in line 71 ; in both cases 
just " the board " would have done fine . 
In line 77 occurs the last use of B, 
viz . in " X CB " , which is dubious since 
B was a board and not a    set ; in line 
77 . I would have preferred " Given a set X [X] 
of cells ". 
line 7/8: The first move, being a move 
like any other , does not deserve a separate 
discription . The term " step " is redundant . 
line 8: Why not "a move consists of"? 
line 10/11: At this stage the italics are 
puzzling, since a move is possible if, 
<<<

Key Highlights of the PDF OCR API

Accurately extracts handwritten content, even from noisy scans.
Outputs structured data that can be seamlessly integrated into your workflows.

LLMWhisperer Document OCR API: Final Thoughts

By showcasing these two practical examples, LLMWhisperer demonstrates its versatility in handling diverse document types, whether historical or handwritten. Its Playground provides an intuitive testing environment, while its API allows seamless integration into enterprise workflows. Together, these tools redefine how we process and extract value from scanned PDFs and images.

PDF hell and practical RAG applications

Extracting text from PDFs for use in applications like Retrieval-Augmented Generation (RAG) and Natural Language Processing (NLP) is challenging due to their fixed layout structure and lack of semantic organization.

Learn how faithfully reproducing the structure of tables and other elements by preserving the original PDF layout is crucial for better performance in LLM and RAG applications.

Learn more →

Introduction to Unstract

Unstract is a transformative platform designed to simplify the processing of unstructured documents by converting them into actionable, structured data. As a powerful companion to LLMWhisperer, Unstract leverages advanced AI and LLMs to refine and organize outputs from diverse document types such as invoices, tax forms, resumes, and handwritten notes. Its robust capabilities eliminate the complexities of manual data extraction, enabling seamless workflows across industries.

Unstract is an open-source no-code LLM platform to launch APIs and ETL pipelines to structure unstructured documents. Get started with this quick guide.

How Unstract Complements LLMWhisperer

LLMWhisperer excels at extracting text and context from scanned PDFs, images, and other unstructured files. Unstract builds upon this foundation by further structuring and enhancing the extracted data, ensuring it is ready for use in downstream applications such as analytics, compliance, or CRM integration.

Seamless Integration: Outputs from LLMWhisperer are fed into Unstract for advanced processing, including context-aware data structuring and organization.
AI-Driven Refinement: While LLMWhisperer parses data, Unstract employs AI and LLMs to interpret relationships between document elements, preserving context and ensuring accuracy.
Adaptability to Document Variability: Unstract accommodates various layouts, designs, and content formats without requiring constant reconfiguration, making it highly efficient for dynamic workflows.

Key Features of Unstract

Dynamic Data Structuring
- Unstract converts unstructured data, such as raw outputs from LLMWhisperer, into structured formats like JSON, CSV, or databases.
- It uses machine learning algorithms to detect patterns, relationships, and hierarchies within documents.

Example: For a scanned tax form, Unstract organizes extracted fields such as “Employer Name,” “Tax Year,” and “Form ID” into structured JSON, ready for integration with financial systems.

No Need for Constant Retraining
- Unlike traditional systems, Unstract adapts to changes in document formats and layouts without requiring retraining or remodelling.
- This flexibility reduces maintenance overhead and ensures consistent performance even as documents evolve.

Example: A recruitment agency can process resumes with varying designs and fonts without modifying the tool each time a new style is encountered.

Multi-Format Document Support
- Unstract handles outputs from diverse sources, including handwritten forms, scanned PDFs, and image-based data.
- It manages complex layouts like multi-column formats, tables, or embedded diagrams.

Example: In an insurance claims process, Unstract can handle scanned accident reports, structured forms, and handwritten notes simultaneously, delivering consistent results.

AI-Enhanced Context Understanding
- Unstract uses AI to understand the context of the extracted data, ensuring that important relationships—like linking totals to invoice numbers or matching candidates’ skills to job descriptions—are preserved.

The AI/LLM Advantage for Document Processing

Unstract integrates with powerful AI/LLMs to enhance its processing capabilities:

Contextual Awareness: AI algorithms allow Unstract to interpret nuanced relationships between extracted fields, ensuring greater accuracy.
Scalable Automation: Whether processing thousands of resumes or extracting data from a batch of handwritten tax forms, Unstract scales effortlessly to meet enterprise demands.
Minimal Human Intervention: With intelligent pre-configuration and adaptability, Unstract reduces the need for manual oversight, improving operational efficiency.

Why Choose Unstract for Document Processing Automation?

Efficiency and Speed: Automates time-consuming tasks like manual data entry and document parsing, delivering results in minutes.
Reliability Across Formats: Adapts to document changes without requiring constant reprogramming or updates.
Seamless Workflow Integration: Outputs can be directly integrated into downstream systems, such as ERPs, CRMs, or analytics platforms.

By complementing the text parsing capabilities of LLMWhisperer with advanced data structuring powered by AI, Unstract transforms the way organizations handle unstructured data. Its adaptability and intelligence make it an indispensable tool for any business seeking to optimize document processing workflows.

Data Extraction from a Scanned Handwritten Tax Form Using Unstract

Extracting structured data from scanned handwritten tax forms is a crucial step in automating tax reporting and compliance workflows. Using Unstract’s Prompt Studio, organizations can tailor prompts to extract critical information such as the taxpayer’s name, total assets, and employer details. Here’s a detailed guide to setting up Unstract to process a scanned handwritten tax form, complete with API integration via Postman.

Step 1: Setting Up the Project in Prompt Studio

Access Prompt Studio
- Log into your Unstract account and navigate to the Prompt Studio page from the dashboard.
- Click New Project to create a project tailored for tax form processing.
- Provide details:
  - Tool Name: “Tax Form OCR Tool”
  - Description: “Extracts critical fields such as taxpayer details and financial information from tax forms.”

Upload the Handwritten Tax Form
- Go to the Manage Documents section and click Upload Files.
- Upload the scanned tax form PDF provided.
Write Tailored Prompts for Key Data Fields: Craft prompts to extract specific fields from the tax form. Here are examples:

Set the output type as ‘json’.

Field Name: contact_info
- Prompt Text: “Extract the contact details from the following text. Results should contain the following fields: phone_number and address.”

Below is the output:

{

    "address": "52, Beach View Avenue, FL 63504",

    "phone_number": "011926580"

}

We can add more prompts per the requirements to extract the desired info.

Let’s extract the Tax Details.

Field Name: tax_details
- Prompt Text: ” Extract the tax details from the following text. Results should contain the following fields: plan_type, return_type, all eins with type, and bussiness_code.”

Below is the output:

{
  "all_eins_with_type": [
    {
      "Employer Identification Number (EIN)": 567325752
    },
    {
      "Administrator's EIN": 778853
    }
  ],
  "bussiness_code": 82856,
  "plan_type": "Annual Return Plan",
  "return_type": "the first return filed for the plan"
}

Run and Test Prompts
- After creating the prompts, click Run to execute them and check the extracted data.
- Verify the output in the Combined Output tab, which displays the results in structured JSON format.

Step 2: Export and Deploy the Tool as an API

Export as Tool
- Once the prompts have been validated, click on Export as Tool to convert the project into a usable tool.
- Assign a name like “Handwritten Tax Form Parser API.”
Create a Workflow
- Navigate to Build → Workflows and click New Workflow.
- Drag and drop the exported tool into the workflow area.
- Define:
  - Input: Accept file uploads (PDFs).
  - Output: Return extracted data in JSON format.
Test the Workflow
- Run the workflow
- Upload the document
- Click on continue

You will see the output on the screen of the loaded tool and the result of the tool.

Deploy as an API
- Go to Manage → API Deployments and click + API Deployment.
- Provide a name, such as “Tax Form Parsing API.”
- Retrieve the API URL and generate an API key for integration.

Step 3: Testing the API via Postman

Configure POST Request
- Open Postman and create a POST Request.
- API URL: Paste the deployment URL from Unstract.
- Authorization:
  - Go to the Headers tab.
  - Add:
    - Key: Authorization
    - Value: unstract-key YOUR_API_KEY (replace with the generated API key).
- Body:
  - Switch to the Body tab.
  - Select form-data format.
  - Add:
    - Key: files
    - Type: File
    - Value: Upload the scanned tax form.
Send the Request
- Click Send.
- Below is the output

View JSON Output

Below is the JSON output of the uploaded handwritten tax form.

{
  "status": "COMPLETED",
  "message": [
    {
      "file": "scanned-form-pdf-ocr.pdf",
      "status": "Success",
      "result": {
        "output": {
          "assets_liabilities": {
            "beginning_of_year": {
              "assets": 25000,
              "liabilities": 3000
            },
            "end_of_year": {
              "assets": 20000,
              "liabilities": 6000
            }
          },
          "contact_indo": {
            "address": "52 Beach View Avenue, FL 63504",
            "phone_number": "011926580"
          },
          "dates": {
            "calendar_plan_year": "2023",
            "date_plan_first_became_effective": "2022-05-06",
            "fiscal_plan_year_beginning": "2022-01-03",
            "fiscal_plan_year_ending": "2025-01-03"
          },
          "employer_name": "Fidelity Finance Corporation",
          "plan_no": {
            "three_digit_plan_number": "956"
          },
          "tax_details": {
            "all_eins_with_type": [
              {
                "type": "Employer Identification Number (EIN)",
                "value": "567325752"
              },
              {
                "type": "Plan Administrator's EIN",
                "value": "778853"
              }
            ],
            "bussiness_code": "This form is required to be filed under section 6058(a) of the Internal Revenue Code.",
            "plan_type": "Form Number: CA 05678",
            "return_type": "OMB No.: 1545-1610"
          }
        }
      },
      "metadata": {
        "source_name": "scanned-form-pdf-ocr.pdf",
        "source_hash": "9daafc8ba7e9a420a58636a2a0da949a4947563a41ca21e478b56025406301ac",
        "organization_id": "org_sNRwTENh7Kdm3EL8",
        "workflow_id": "b8cdac28-2843-4a43-bf90-d275d8c92080",
        "execution_id": "fba9d6f3-275c-47ea-86f8-e76888932939",
        "total_elapsed_time": 73.48845,
        "tool_metadata": [
          {
            "tool_name": "structure_tool",
            "elapsed_time": 73.488446,
            "output_type": "JSON"
          }
        ]
      }
    }
  ]
}

Key Advantages of Using Unstract for Tax Form Processing

Efficiency: Automates the extraction of detailed data fields without manual effort.
Scalability: Handles multiple tax forms simultaneously with high accuracy.
Integration Ready: Outputs are available in structured JSON and can be integrated with tax software or databases.
Flexibility: Prompts can be customized for other financial forms or compliance documents.

By leveraging Unstract and its seamless integration with LLMWhisperer, organizations can extract meaningful data from even the most complex handwritten tax forms with unparalleled ease and precision.

Conclusion

The synergy between LLMWhisperer and Unstract represents a groundbreaking approach to transforming unstructured documents into actionable data. LLMWhisperer, with its robust text parsing capabilities, serves as a reliable pre-processing layer, ensuring that even the most complex or degraded scanned documents are converted into structured text.

Unstract complements this by leveraging advanced AI and LLM-powered workflows to extract, organize, and present the data in meaningful formats such as JSON, empowering businesses to streamline processes and make data-driven decisions.

OCR APIs, such as those offered by LLMWhisperer, are revolutionizing how organizations handle document-heavy workflows.

If you want to take LLMWhisperer OCR API for a test drive quickly, you can check out our free playground.

Try Playground with your docs

By automating data extraction, these tools eliminate manual inefficiencies, reduce human error, and enhance scalability across industries. Whether dealing with handwritten tax forms, complex tables, or multi-language documents, the ability to seamlessly integrate OCR APIs into existing systems enables businesses to unlock the value hidden within their unstructured data repositories.

In an era driven by digital transformation, the integration of solutions like LLMWhisperer and Unstract into business workflows underscores the potential of technology to simplify, accelerate, and optimize operations.

As document processing demands grow, these tools pave the way for a future where unstructured data is no longer a bottleneck but a competitive advantage, enabling enterprises to operate with precision and agility in a fast-paced digital landscape.

Get started with LLMWhisperer: Best OCR API for AI Document Worklfows

Discover how LLMWhisperer, Unstract’s dedicated OCR API, prepares documents for peak LLM performance and sets standards for LLM-ready outputs.

Best Document OCR API: Related topics to explore

Extract PDF Data to Excel Spreadsheet with Unstract

Document OCR API for extracting data from invoice

Best document OCR API for reading bookkeeping documents

Best document OCR API for accounts payable documents

The post Guide to LLMWhisperer Document OCR API first appeared on Unstract.com.