Configuration of Tesseract
The Tesseract library is known as the one of the most famous free libraries for the optical character recognition (OCR). It was initially developed by Hewlett Packard about 20 years ago. Since year 2006 Google bought the sources and proceeded the work on it. This tutorial contains some hints on how to set up and use the library in Windows OS.
At the initial step, we need to download and install the following programs
As soon we are ready with above, it’s time to create a directory for building the Tesseract. Let’s assume that our path is D:\Tesseract files\. The next step is to move to this directory using the Git CMD prompt. Figure below illustrates how this can be done.
- Fig. 1. Git CMD example
Now we are exactly in required folder. Firstly, we are copying all dependencies from GitHub repository using the following command
git clone git://github.com/pvorb/tesseract-vs2013.git
The next step is to open the VS2013 developer command prompt
- Fig. 2. Path to VS2013 command prompt
And change the path to the created folder
- Fig. 3. VS2013 command prompt
As you see we already have the “tesseract-vs2013” folder inside. Now we can perform building using the command
After this step the VS2013 can be closed.
Building the Tesseract
At this stage we are ready to build the library. This is done using the following steps
- Re-open the Git command prompt (Fig. 2) and ensure it’s still in D:\Tesseract files\.
- Get the latest source using SVN(print in cmd): svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
- Change to the newly checked-out repository(in cmd): cd tesseract-ocr
- Apply the patch provided in tesseract-vs2013 (print in cmd). svn patch ..\tesseract-vs2013\vs2013+64bit_support.patch you should see something like this
- Fig. 4. Tesseract building – step 1
After closing the Git command prompt we alrady have the folders containing header and library files:
C:\ Tesseract files \include\
C:\ Tesseract files \lib\
Now all we need is to open the VS2013 solution and build the source
- Open D:\ Tesseract files \tesseract-ocr\vs2013\tesseract.sln with Visual Studio 2013.
- Build the project .
The current VS2013 solution contains configurations for both x32 and x64 platforms (both dynamic and static ). As a result, you can find the compiled binaries in D:\ Tesseract files \tesseract-ocr\vs2013\bin\.
Connect the Tesseract to existing VS2013 C++ project
In order to set up the VS2013 project and use the OCR engine in it, we need to carefully handle the Tesseract library files and set up the paths in C++ in a proper way.
- Copy tesseract .dll files to necessary project:From “Tesseract files\lib” copy “libtesseract304.dll” (or “libtesseract304d.dll”) to “Release” (or “Debug”) folder in necessary project (In this folder must be exe file).From “Tesseract files\lib\Win32” (or X64) copy “liblept171.dll” (or “liblept171d.dll”) to “Release” (or “Debug”) folder in necessary project (In this folder must be exe file).
- Set properties of necessary project in VS (Alt+F7) for debug mode win32:
In C++ –> general –> Additional Include Directories :
- In Linker –> General –> Additional Library Directories:
C:\Tesseract files\lib\Win32 C:\Tesseract files\lib
- In linker –> General –> Additional Library Directories:
In Linker –> Input –> Additional Dependencies:
liblept171d.lib.Example with recognized text:
- Fig. 5. OCR example