Tutorial for installing Tesseract

         You’ve undoubtedly seen it before… It’s widely used to process everything from scanned documents to the handwritten scribbles on your tablet PC and Google Translate. And today you’ll create your first app for text recognition.

What is OCR?

         Optical Character Recognition, or OCR, is the process of electronically extracting text from images and reusing it in a variety of ways such as document editing, free-text searches, or compression. In this tutorial, you’ll learn how to install Tesseract, an open source OCR engine maintained by Google.

How to install Tesseract for Microsoft Visual Studio?

Step 1:

           To install Tesseract you need to install the following programs:

 

git

http://git-scm.com/
 

slik-svn

http://www.sliksvn.com/en/download
 

visual-studio

https://www.visualstudio.com

Step 2:

      What’s next? That’s right, create a folder where we want to install Tesseract. This can be any directory on your computer, for example: “D:\Tesseract-files”.
.      After that, run GIT CMD and move to Tesseract`s folder. Your GIT command line should look like this:

Fig-1

Fig. 1. GIT CMD example

 

Step 3:

         Now you need to copy the entire dependency from the GitHub repository to your computer. To do this, we write the following command in GIT CMD:
git clone git://github.com/pvorb/tesseract-vs2013.git. In the console GIT CMD you will see something like this:

Fig-2

Fig. 2. Clone tesseract-vs2013.git

         After executing this command, you will see the following in the console:

Fig-3

Fig. 3. Clone tesseract-vs2013 done

 

Step 4:

    For the next step, run VS2013 developer command Prompt. It is in: {directory of MS VS}\Common7\Tools\Shortcuts\Developer Command Promt VS2013. And move to D:\Tesseract-files\tesseract-vs2013.

Fig-4

Fig. 4. Command promt for VS2013

         Now we can perform building using the command msbuild build.proj: 

Fig-5

Fig. 5. Start perform build

After this step the VS2013 can be closed.

 

Step 5:

         Reopen GIT CMD and check folder and check the working directory. Must be “D:\Tesseract-files\”.  After that, gets the latest source using SVN (print in GIT CMD):   svn checkout https://github.com/svn2github/Tesseract.git.

Fig-6

Fig. 6. Checkout Tesseract

 

       After performing this procedure, the new folder appears in a folder D:\Tesseract-files\ which name is Tesseract.git\.
.    Move in GIT CMD to D:\Tesseract-files\Tesseract.git\trunk and apply the patch provided in tesseract-vs2013 (print in cmd): svn patch D:\Tesseract-files\tesseract-vs2013\vs2013+64bit_support.patch

Fig-7

Fig. 7. Patch provided in tesseract-vs2013

 

    Copy both directory (lib and include) from D:\Tesseract-files\tesseract-vs2013\release into D:\Tesseract-files\Tesseract.git\trunk\
.       Open D:\Tesseract-files\Tesseract.git\trunk\vs2013\tesseract.sln with Visual Studio 2013.

 

Step 6:

     Open Property pages of libtesseract304 and in Configuration Properties->C/C++->General->Additional Include Directories  add D:\Tesseract-files\Tesseract.git\trunk\include\  and D:\Tesseract-files\Tesseract.git\trunk\include\ leptonica\; In Property  pages open Linker->General->Additional Library Directories add D:\Tesseract-files\Tesseract.git\trunk\lib\x64\;
.       It is necessary to repeat this operation for Debug and Release. Build the project in Release and Debug.

Step 7:

     What would Tesseract recognized the text he needs training files. They can be found in: https://github.com/tesseract-ocr/tessdata. Download the necessary files and copy them to D: \Tesseract-files\Tesseract.git\trunk\ tessdata\

 

Step 8

     Copy tesseract`s .dll files to necessary project from D:\Tesseract-files\Tesseract.git\lib copy libtesseract304.dll (or libtesseract304d.dll) to Release (or Debug) folder in necessary project (In this folder must be exe file).From D:\Tesseract-files\tesseract-vs2013\lib\x64 (or X64) copy liblept171.dll (or liblept171d.dll) to Release (or Debug) folder in necessary project (In this folder must be exe file).

         Connect Tesseract into project (is necessary for Debug and for Release).

Set properties of necessary project:

  in C/C++ –> General –> Additional Include Directories:
D:\Tesseract-files\Tesseract.git\trunk\
D:\Tesseract-files\Tesseract.git\trunk\ccmain
D:\Tesseract-files\Tesseract.git\trunk\ccstruct
D:\Tesseract-files\Tesseract.git\trunk\ccutil
D:\Tesseract-files\Tesseract.git\trunk\leptonica
D:\Tesseract-files\Tesseract.git\trunk\api
D:\Tesseract-files\Tesseract.git\trunk\include

In Linker –> General –> Additional Library Directories:
D:\Tesseract-files\Tesseract.git\lib\x64
D:\Tesseract-files\Tesseract.git\lib\

In Linker –> Input –> Additional Dependencies:

for Debug

libtesseract304d.lib
liblept171d.lib

for Release

libtesseract304d.lib
liblept171d.lib.

Step 9

So, create new console application and paste this code:

#include “baseapi.h”

#include “allheaders.h”

int main()

{

                char *outText;

                tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

                // Initialize tesseract-ocr  with English, without specifying tessdata path

                if (api->Init(“D:\\Tesseract-files\\Tesseract.git\\trunk”, “eng”)){

                               fprintf(stderr, “Could not initialize tesseract.\n”);

                               exit(1);

                }

                // Open input image

                Pix *image = pixRead(“yout_image.tif”);

                api->SetImage(image);

                // set list of allowed characters

                api->SetVariable(“tessedit_char_whitelist”, “abcdefghijklmnoprstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-.;,:/0123456789”);

                // Get OCR result

                outText = api->GetUTF8Text();

                printf(“OCR output:\n%s”, outText);

                // Destroy used object and release memory

                api->End();

                delete[] outText;

                pixDestroy(&image);

return 0;

}

 

Then build and compile the project.

 

As a result you will get:

Fig-8

Fig.8. Input image

 

Fig-9

Fig. 9. Output result

 

Congratulation! You installed and started your first text recognition program!

Tesseract library configuration

2 thoughts on “Tesseract library configuration

  • 15.02.2016 at 15:55
    Permalink

    Wow great. so excited to try this tutorial.

    Reply
  • 22.08.2016 at 20:43
    Permalink

    thank you very much just hope with this beautiful tutorial if able to compile tesseract to use visual studio . incidentally be correct this tutorial procedure would apply equally to compile tesseract for visual studio 2015

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *