How To Install and Use Tesseract OCR on Ubuntu 24.04

November 14, 2024

On this short tutorial we will show you how to install and use Tesseract on Ubuntu 24.04 Linux operating system.

Tesseract is an open-source Optical Character Recognition (OCR) software developed by Hewlett-Packard and now maintained by Google. It’s widely used for converting images or scanned documents into editable text. Tesseract supports various languages and can recognize text from different formats, including JPG, PNG, and PDF files. It works on multiple platforms, including Linux, macOS, and Windows, making it popular in document processing, data extraction, and image-to-text applications.

To install and use Tesseract on Ubuntu, follow these steps:
1. Install Tesseract
2. Convert Images to Text
3. Specify Language (Optional)
4. View Output

Step 1 : Install Tesseract

To install Tesseract we need to update our Ubuntu repository first, then do the installation. Open the terminal perform these tasks below .

$ sudo apt update
$ sudo apt install tesseract-ocr

The output will be shown below :

ramansah@dev02:~$ sudo apt install tesseract-ocr
[sudo] password for ramansah: 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
liblept5 libtesseract5 tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
liblept5 libtesseract5 tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 5 newly installed, 0 to remove and 6 not upgraded.
Need to get 8,377 kB of archives.
After this operation, 21.8 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://id.archive.ubuntu.com/ubuntu noble/universe amd64 liblept5 amd64 1.82.0-3build4 [1,099 kB]
Get:2 http://id.archive.ubuntu.com/ubuntu noble/universe amd64 libtesseract5 amd64 5.3.4-1build5 [1,291 kB]
Get:3 http://id.archive.ubuntu.com/ubuntu noble/universe amd64 tesseract-ocr-eng all 1:4.1.0-2 [1,818 kB]
Get:4 http://id.archive.ubuntu.com/ubuntu noble/universe amd64 tesseract-ocr-osd all 1:4.1.0-2 [3,841 kB] 
Get:5 http://id.archive.ubuntu.com/ubuntu noble/universe amd64 tesseract-ocr amd64 5.3.4-1build5 [328 kB] 
Fetched 8,377 kB in 10s (843 kB/s) 
Selecting previously unselected package liblept5:amd64.
(Reading database ... 302415 files and directories currently installed.)
Preparing to unpack .../liblept5_1.82.0-3build4_amd64.deb ...
Unpacking liblept5:amd64 (1.82.0-3build4) ...
Selecting previously unselected package libtesseract5:amd64.
Preparing to unpack .../libtesseract5_5.3.4-1build5_amd64.deb ...
Unpacking libtesseract5:amd64 (5.3.4-1build5) ...
Selecting previously unselected package tesseract-ocr-eng.
Preparing to unpack .../tesseract-ocr-eng_1%3a4.1.0-2_all.deb ...
Unpacking tesseract-ocr-eng (1:4.1.0-2) ...
Selecting previously unselected package tesseract-ocr-osd.
Preparing to unpack .../tesseract-ocr-osd_1%3a4.1.0-2_all.deb ...
Unpacking tesseract-ocr-osd (1:4.1.0-2) ...
Selecting previously unselected package tesseract-ocr.
Preparing to unpack .../tesseract-ocr_5.3.4-1build5_amd64.deb ...
Unpacking tesseract-ocr (5.3.4-1build5) ...
Setting up tesseract-ocr-eng (1:4.1.0-2) ...
Setting up liblept5:amd64 (1.82.0-3build4) ...
Setting up libtesseract5:amd64 (5.3.4-1build5) ...
Setting up tesseract-ocr-osd (1:4.1.0-2) ...
Setting up tesseract-ocr (5.3.4-1build5) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for libc-bin (2.39-0ubuntu8.3) ...

You may also want to install additional language packs:

$ sudo apt install tesseract-ocr-LANG

Replace `LANG` with the desired language code (e.g., `eng` for English, `fra` for French).

After the installation, we will verify it by querying its version, by submitting command line :

$ tesseract --version

The output will as be shown belows :

ramansah@dev02:~$ tesseract --version
tesseract 5.3.4
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5) : libpng 1.6.43 : libtiff 4.5.1 : zlib 1.3 : libwebp 1.3.2 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.2 zlib/1.3 liblzma/5.4.5 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
Found libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7
ramansah@dev02:~$

2. Convert Images to Text

To convert the images to text we just use a simplec command belows.

$ tesseract image.jpg output.txt

Replace `image.jpg` with your image file and `output.txt` with your desired output file name. This will extract the text from the image and save it in `output.txt`.

The sample is as shown :

ramansah@dev02:~/Downloads$ ls -ltr
total 76
-rw-rw-r-- 1 ramansah ramansah 8102 Nov 14 14:21 football.jpeg
-rw-rw-r-- 1 ramansah ramansah 0 Nov 14 14:35 ocr_text.txt
-rw-rw-r-- 1 ramansah ramansah 67411 Nov 14 14:48 sbux_sample.jpg
ramansah@dev02:~/Downloads$ pwd
/home/ramansah/Downloads
ramansah@dev02:~/Downloads$ tesseract /home/ramansah/Downloads/sbux_sample.jpg /home/ramansah/ocr_text

3. Specify Language (Optional)

If the image text is in a specific language, add the `-l` option:

$ tesseract image.jpg output.txt -l eng

4. View Output

Check the `output.txt` file for the extracted text.

The sample will be as shown belows :
1.Original source file

In this sample we will use a Starbucks Coffee Payment Receipt, to be converted to text file.

2. Converted file

The file result of Tesseract OCR convertion

As we can see from the screen shot above, if the image file was converted into the text as well. This tools is very useful for automation applications which is used to scan huge images file into text format.

Conclusion

On this short article we have shown you the Tesseract installation on Ubuntu 24.04 LTS operating system and also take a simple task for converting image file format into text file format.

(Visited 533 times, 1 visits today)

How To Install and Use Tesseract OCR on Ubuntu 24.04

Step 1 : Install Tesseract

2. Convert Images to Text

3. Specify Language (Optional)

4. View Output

Conclusion

Leave a Reply Cancel reply

You may also like

Ads

Search

Ads

Related

Recent Posts

Ads

🧭 PostgreSQL Tutorial

Step 1 : Install Tesseract

2. Convert Images to Text

3. Specify Language (Optional)

4. View Output

Conclusion

Related posts:

Leave a Reply Cancel reply

You may also like

Exploring Trino with Apache Iceberg: My Hands-On Experiment

How to Install OpenClaw on Ubuntu 24.04 LTS (Step-by-Step Guide)

Ads

Search

Ads

Related

Recent Posts

Ads

🧭 PostgreSQL Tutorial