Questions tagged [ocr]
Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).
6,241
questions
429
votes
3
answers
270k
views
Simple Digit Recognition OCR in OpenCV-Python
I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.
I have 100 samples (i.e. ...
207
votes
15
answers
251k
views
image processing to improve tesseract OCR accuracy
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed ...
175
votes
14
answers
75k
views
Has reCaptcha been cracked / hacked / OCR'd / defeated / broken? [closed]
Have any programming methods have been used to defeat reCAPTCHA?
I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely ...
166
votes
5
answers
217k
views
Java OCR implementation [closed]
This is primarily just curiosity, but are there any OCR implementations in pure Java? I'm curious how this would perform purely in Java, and OCR in general interests me, so I'd love to see how it's ...
149
votes
6
answers
146k
views
Is there any free OCR library for Android? [closed]
I'm looking for a Java OCR that runs on Android, however Asprise doesn't seem to be a platform independent OCR. is there any opensource/free Java OCR I can use for android application development?
132
votes
22
answers
217k
views
Tesseract running error
I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with ...
100
votes
4
answers
72k
views
How do I choose between Tesseract and OpenCV? [closed]
I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service.
I tried using Tesseract ...
91
votes
7
answers
121k
views
Limit characters tesseract is looking for
Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.
75
votes
4
answers
197k
views
Pytesseract OCR multiple config options
I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often ...
75
votes
1
answer
3k
views
How to get Indexing Service and MODI to produce Full-text over OCR?
I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) ...
72
votes
11
answers
106k
views
How to recognize vehicle license / number plate (ANPR) from an image? [closed]
I have a web site that allows users to upload images of cars and I would like to put a privacy filter in place to detect registration plates on the vehicle and blur them.
The blurring is not a ...
72
votes
10
answers
143k
views
How to make tesseract to recognize only numbers, when they are mixed with letters?
I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789")
for every symbol ...
71
votes
1
answer
122k
views
best OCR (Optical character recognition) example in android [closed]
I want a running example of OCR in android, I have done some research and find an example that implements OCR in android.
https://github.com/rmtheis/tess-two and in it there are three projects files.....
64
votes
5
answers
152k
views
How to implement and do OCR in a C# project?
I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for ...
63
votes
8
answers
142k
views
Getting the bounding box of the recognized words using python-tesseract
I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.
I am using the following code for getting the words:
import tesseract
api = ...
59
votes
8
answers
53k
views
What are good algorithms for vehicle license plate detection? [closed]
Background
For my final project at university, I'm developing a vehicle license plate detection application. I consider myself an intermediate programmer, however my mathematics knowledge lacks ...
58
votes
8
answers
27k
views
Converting a Vision VNTextObservation to a String
I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages:
1) class VNDetectTextRectanglesRequest
2) class VNTextObservation
...
57
votes
2
answers
40k
views
How can I implement OCR on a website using PHP? [closed]
Are there any free OCR libraries that work with PHP or Python on a Linux server? The idea is to be able to upload an image and pull out characters from it, or allow users to "draw characters", and ...
53
votes
7
answers
171k
views
Use pytesseract OCR to recognize text from an image
I need to use Pytesseract to extract text from this picture:
and the code:
from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
path = 'pic.gif'
img = Image.open(path)
img = img....
53
votes
10
answers
48k
views
OCR lib for math formulas
I need an open OCR library which is able to scan complex printed math formulas (for example some formulas which were generated via LaTeX). I want to get some LaTeX-like output (or just some AST-like ...
52
votes
6
answers
17k
views
How to get the word under the cursor in Windows?
I want to create a application which gets the word under the cursor (not only for text fields), but I can't find how to do that. Using OCR is pretty hard. The only thing I've seen working is the ...
50
votes
1
answer
55k
views
Using Tesseract for handwriting recognition
I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form.
I know you can train it to recognise your own ...
47
votes
2
answers
92k
views
Detect text area in an image using python and opencv
I want to detect the text area of images using python 2.7 and opencv 2.4.9
and draw a rectangle area around it. Like shown in the example image below.
I am new to image processing so any idea how to ...
46
votes
2
answers
6k
views
Set Tesseract font for OCR
I would like to use tesseract for serial number recognition, where I only want to recognize single characters, no word, no dictionary.
Therefore I would like to use one of the already trained ...
44
votes
2
answers
26k
views
Split text lines in scanned document
I am trying to find a way to break the split the lines of text in a scanned document that has been adaptive thresholded. Right now, I am storing the pixel values of the document as unsigned ints from ...
44
votes
3
answers
22k
views
Detect if an OCR text image is upside down
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform ...
42
votes
4
answers
80k
views
Android OCR Library [closed]
Does anyone know any available libraries or sample codes that can be used to develop an app that reads the text in an image captured by the camera? Something similar to Google Goggles but only for ...
41
votes
9
answers
80k
views
Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?
I am capturing image using SurfaceView and getting Yuv Raw preview data in public void onPreviewFrame4(byte[] data, Camera camera)
I have to perform some image preprocessing in onPreviewFrame so i ...
41
votes
7
answers
4k
views
Extracting code from photograph of T-shirt via OCR
I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code:
Next I tried to extract the code from the image via OCR, so I installed ...
41
votes
4
answers
68k
views
What kind of OCR Java library should I use in Android? [closed]
I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it .
What Java library should I use?
40
votes
6
answers
39k
views
How do I segment a document using Tesseract then output the resulting bounding boxes and labels
I'm trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this 'out of the box' because of the results ...
37
votes
9
answers
50k
views
What is the ideal font for OCR?
Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E'n different fonts, but this seems pretty ...
37
votes
4
answers
64k
views
Character recognition (OCR algorithm) [closed]
I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image.
Steps ...
37
votes
1
answer
2k
views
Using Microsoft OCR Library with JS/jQuery in VS 2013
I am currently working on a windows 8.1 application and I am using web languages and mostly jQuery (Cordova type project) as it might be used on other platforms.
I need to use the Microsoft OCR ...
36
votes
6
answers
103k
views
Preprocessing image for Tesseract OCR with OpenCV
I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and ...
35
votes
5
answers
49k
views
How to install language in tesseract OCR
I have installed tesseract OCR and it has only 'eng' and 'osd' in the language list. I need german language. I tired following command
brew install tesseract-ocr-deu
but i am getting error.
Error: ...
35
votes
6
answers
11k
views
Recognize a number from an image
I'm trying to write an application to find the numbers inside an image and add them up.
How can I identify the written number in an image?
There are many boxes in the image I need to get the ...
35
votes
9
answers
154k
views
Tesseract OCR simple example
Hi Can you anyone give me a simple example of testing Tesseract OCR
preferably in C#.
I tried the demo found here.
I download the English dataset and unzipped in C drive. and modified the code as ...
34
votes
6
answers
79k
views
Using Tesseract from java
I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++...
34
votes
3
answers
4k
views
Is there an efficient algorithm for segmentation of handwritten text?
I want to automatically divide an image of ancient handwritten text by lines (and by words in future).
The first obvious part is preprocessing the image...
I'm just using a simple digitization (...
33
votes
5
answers
60k
views
OCR with the Tesseract interface
How do you OCR an tiff file using Tesseract's interface in c#?
Currently I only know how to do it using the executable.
33
votes
8
answers
30k
views
Is there an OCR library that outputs coordinates of words found within an image? [closed]
In my experience, OCR libraries tend to merely output the text found within an image but not where the text was found. Is there an OCR library that outputs both the words found within an image as well ...
32
votes
7
answers
40k
views
How to remove all lines and borders in an image while keeping text programmatically?
I'm trying to extract text from an image using Tesseract OCR.
Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders ...
32
votes
2
answers
40k
views
How can I run tesseract with multiple languages one time?
I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if I run tesseract with japanese (-l ...
32
votes
5
answers
80k
views
Tesseract ocr PDF as input
I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a ...
32
votes
2
answers
32k
views
Which OCR Engine is better: Tesseract or OCRopus? [closed]
I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: ...
31
votes
10
answers
62k
views
Programmatically recognize text from scans in a PDF File [closed]
I have a PDF file, which contains data that we need to import into a database. The files seem to be pdf scans of printed alphanumeric text. Looks like 10 pt. Times New Roman.
Are there any tools ...
31
votes
3
answers
56k
views
Tesseract training for a new font
I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which ...
31
votes
8
answers
53k
views
How to know if a PDF contains only images or has been OCR scanned for searching?
I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the ...
30
votes
2
answers
15k
views
What OCR options exist beyond Tesseract? [closed]
I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage)...