Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

ocr
Filter by
Sorted by
Tagged with
429 votes
3 answers
270k views

Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV. I have 100 samples (i.e. ...
Abid Rahman K's user avatar
207 votes
15 answers
251k views

image processing to improve tesseract OCR accuracy

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed ...
user364902's user avatar
  • 3,246
175 votes
14 answers
75k views

Has reCaptcha been cracked / hacked / OCR'd / defeated / broken? [closed]

Have any programming methods have been used to defeat reCAPTCHA? I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely ...
Dave Rutledge's user avatar
166 votes
5 answers
217k views

Java OCR implementation [closed]

This is primarily just curiosity, but are there any OCR implementations in pure Java? I'm curious how this would perform purely in Java, and OCR in general interests me, so I'd love to see how it's ...
rat's user avatar
  • 2,554
149 votes
6 answers
146k views

Is there any free OCR library for Android? [closed]

I'm looking for a Java OCR that runs on Android, however Asprise doesn't seem to be a platform independent OCR. is there any opensource/free Java OCR I can use for android application development?
user121196's user avatar
  • 30.5k
132 votes
22 answers
217k views

Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with ...
Russel Crowe's user avatar
  • 1,331
100 votes
4 answers
72k views

How do I choose between Tesseract and OpenCV? [closed]

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service. I tried using Tesseract ...
Legend's user avatar
  • 115k
91 votes
7 answers
121k views

Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.
Danilo Bargen's user avatar
75 votes
4 answers
197k views

Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often ...
Niall Oswald's user avatar
75 votes
1 answer
3k views

How to get Indexing Service and MODI to produce Full-text over OCR?

I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) ...
Ishmaeel's user avatar
  • 14.3k
72 votes
11 answers
106k views

How to recognize vehicle license / number plate (ANPR) from an image? [closed]

I have a web site that allows users to upload images of cars and I would like to put a privacy filter in place to detect registration plates on the vehicle and blur them. The blurring is not a ...
Ryan O'Neill's user avatar
  • 5,525
72 votes
10 answers
143k views

How to make tesseract to recognize only numbers, when they are mixed with letters?

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789") for every symbol ...
zkunov's user avatar
  • 3,402
71 votes
1 answer
122k views

best OCR (Optical character recognition) example in android [closed]

I want a running example of OCR in android, I have done some research and find an example that implements OCR in android. https://github.com/rmtheis/tess-two and in it there are three projects files.....
Komal's user avatar
  • 739
64 votes
5 answers
152k views

How to implement and do OCR in a C# project?

I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for ...
Berker Yüceer's user avatar
63 votes
8 answers
142k views

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api = ...
Abtin Rasoulian's user avatar
59 votes
8 answers
53k views

What are good algorithms for vehicle license plate detection? [closed]

Background For my final project at university, I'm developing a vehicle license plate detection application. I consider myself an intermediate programmer, however my mathematics knowledge lacks ...
Ash's user avatar
  • 3,532
58 votes
8 answers
27k views

Converting a Vision VNTextObservation to a String

I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages: 1) class VNDetectTextRectanglesRequest 2) class VNTextObservation ...
Adrian's user avatar
  • 16.4k
57 votes
2 answers
40k views

How can I implement OCR on a website using PHP? [closed]

Are there any free OCR libraries that work with PHP or Python on a Linux server? The idea is to be able to upload an image and pull out characters from it, or allow users to "draw characters", and ...
Moshe's user avatar
  • 57.9k
53 votes
7 answers
171k views

Use pytesseract OCR to recognize text from an image

I need to use Pytesseract to extract text from this picture: and the code: from PIL import Image, ImageEnhance, ImageFilter import pytesseract path = 'pic.gif' img = Image.open(path) img = img....
Smith John's user avatar
  • 1,065
53 votes
10 answers
48k views

OCR lib for math formulas

I need an open OCR library which is able to scan complex printed math formulas (for example some formulas which were generated via LaTeX). I want to get some LaTeX-like output (or just some AST-like ...
Albert's user avatar
  • 66.7k
52 votes
6 answers
17k views

How to get the word under the cursor in Windows?

I want to create a application which gets the word under the cursor (not only for text fields), but I can't find how to do that. Using OCR is pretty hard. The only thing I've seen working is the ...
blez's user avatar
  • 4,999
50 votes
1 answer
55k views

Using Tesseract for handwriting recognition

I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form. I know you can train it to recognise your own ...
Jackdaw's user avatar
  • 693
47 votes
2 answers
92k views

Detect text area in an image using python and opencv

I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it. Like shown in the example image below. I am new to image processing so any idea how to ...
User9412's user avatar
  • 491
46 votes
2 answers
6k views

Set Tesseract font for OCR

I would like to use tesseract for serial number recognition, where I only want to recognize single characters, no word, no dictionary. Therefore I would like to use one of the already trained ...
Mr.Sheep's user avatar
  • 1,378
44 votes
2 answers
26k views

Split text lines in scanned document

I am trying to find a way to break the split the lines of text in a scanned document that has been adaptive thresholded. Right now, I am storing the pixel values of the document as unsigned ints from ...
Alex's user avatar
  • 4,166
44 votes
3 answers
22k views

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python. Here is the code I used: import numpy as np import cv2 from skimage.transform ...
singrium's user avatar
  • 2,896
42 votes
4 answers
80k views

Android OCR Library [closed]

Does anyone know any available libraries or sample codes that can be used to develop an app that reads the text in an image captured by the camera? Something similar to Google Goggles but only for ...
Noah's user avatar
  • 467
41 votes
9 answers
80k views

Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?

I am capturing image using SurfaceView and getting Yuv Raw preview data in public void onPreviewFrame4(byte[] data, Camera camera) I have to perform some image preprocessing in onPreviewFrame so i ...
Hitesh Patel's user avatar
  • 2,868
41 votes
7 answers
4k views

Extracting code from photograph of T-shirt via OCR

I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code: Next I tried to extract the code from the image via OCR, so I installed ...
BioGeek's user avatar
  • 22.4k
41 votes
4 answers
68k views

What kind of OCR Java library should I use in Android? [closed]

I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it . What Java library should I use?
systempuntoout's user avatar
40 votes
6 answers
39k views

How do I segment a document using Tesseract then output the resulting bounding boxes and labels

I'm trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this 'out of the box' because of the results ...
James Owers's user avatar
  • 8,118
37 votes
9 answers
50k views

What is the ideal font for OCR?

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E'n different fonts, but this seems pretty ...
Chris Lloyd's user avatar
  • 12.2k
37 votes
4 answers
64k views

Character recognition (OCR algorithm) [closed]

I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image. Steps ...
TLE's user avatar
  • 705
37 votes
1 answer
2k views

Using Microsoft OCR Library with JS/jQuery in VS 2013

I am currently working on a windows 8.1 application and I am using web languages and mostly jQuery (Cordova type project) as it might be used on other platforms. I need to use the Microsoft OCR ...
ColonelMoumou's user avatar
36 votes
6 answers
103k views

Preprocessing image for Tesseract OCR with OpenCV

I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and ...
Mauricio's user avatar
  • 839
35 votes
5 answers
49k views

How to install language in tesseract OCR

I have installed tesseract OCR and it has only 'eng' and 'osd' in the language list. I need german language. I tired following command brew install tesseract-ocr-deu but i am getting error. Error: ...
Lama Madan's user avatar
35 votes
6 answers
11k views

Recognize a number from an image

I'm trying to write an application to find the numbers inside an image and add them up. How can I identify the written number in an image? There are many boxes in the image I need to get the ...
Hash's user avatar
  • 7,894
35 votes
9 answers
154k views

Tesseract OCR simple example

Hi Can you anyone give me a simple example of testing Tesseract OCR preferably in C#. I tried the demo found here. I download the English dataset and unzipped in C drive. and modified the code as ...
Will Robinson's user avatar
34 votes
6 answers
79k views

Using Tesseract from java

I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++...
Omnipresent's user avatar
  • 29.8k
34 votes
3 answers
4k views

Is there an efficient algorithm for segmentation of handwritten text?

I want to automatically divide an image of ancient handwritten text by lines (and by words in future). The first obvious part is preprocessing the image... I'm just using a simple digitization (...
Ernado's user avatar
  • 641
33 votes
5 answers
60k views

OCR with the Tesseract interface

How do you OCR an tiff file using Tesseract's interface in c#? Currently I only know how to do it using the executable.
toh yen cheng's user avatar
33 votes
8 answers
30k views

Is there an OCR library that outputs coordinates of words found within an image? [closed]

In my experience, OCR libraries tend to merely output the text found within an image but not where the text was found. Is there an OCR library that outputs both the words found within an image as well ...
Adam Paynter's user avatar
  • 46.5k
32 votes
7 answers
40k views

How to remove all lines and borders in an image while keeping text programmatically?

I'm trying to extract text from an image using Tesseract OCR. Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders ...
wind's user avatar
  • 423
32 votes
2 answers
40k views

How can I run tesseract with multiple languages one time?

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if I run tesseract with japanese (-l ...
pars's user avatar
  • 429
32 votes
5 answers
80k views

Tesseract ocr PDF as input

I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a ...
acrab's user avatar
  • 359
32 votes
2 answers
32k views

Which OCR Engine is better: Tesseract or OCRopus? [closed]

I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: ...
Ahmed Hussein's user avatar
31 votes
10 answers
62k views

Programmatically recognize text from scans in a PDF File [closed]

I have a PDF file, which contains data that we need to import into a database. The files seem to be pdf scans of printed alphanumeric text. Looks like 10 pt. Times New Roman. Are there any tools ...
Rob's user avatar
  • 3,026
31 votes
3 answers
56k views

Tesseract training for a new font

I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which ...
user19235's user avatar
  • 601
31 votes
8 answers
53k views

How to know if a PDF contains only images or has been OCR scanned for searching?

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the ...
Bratch's user avatar
  • 4,153
30 votes
2 answers
15k views

What OCR options exist beyond Tesseract? [closed]

I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage)...
ylluminate's user avatar
  • 12.2k

1
2 3 4 5
125