Highest scored 'ocr' questions

429 votes

3 answers

270k views

Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV. I have 100 samples (i.e. ...

Abid Rahman K

52.4k

asked Feb 23, 2012 at 12:37

207 votes

15 answers

251k views

image processing to improve tesseract OCR accuracy

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed ...

user364902

3,246

asked Feb 28, 2012 at 10:12

175 votes

14 answers

75k views

Has reCaptcha been cracked / hacked / OCR'd / defeated / broken? [closed]

Have any programming methods have been used to defeat reCAPTCHA? I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely ...

Dave Rutledge

5,525

asked Jan 15, 2009 at 23:32

166 votes

5 answers

217k views

Java OCR implementation [closed]

This is primarily just curiosity, but are there any OCR implementations in pure Java? I'm curious how this would perform purely in Java, and OCR in general interests me, so I'd love to see how it's ...

rat

2,554

asked Nov 28, 2009 at 21:55

149 votes

6 answers

146k views

Is there any free OCR library for Android? [closed]

I'm looking for a Java OCR that runs on Android, however Asprise doesn't seem to be a platform independent OCR. is there any opensource/free Java OCR I can use for android application development?

user121196

30.5k

asked Jul 9, 2009 at 20:13

132 votes

22 answers

217k views

Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with ...

Russel Crowe

1,331

asked Feb 10, 2013 at 17:53

100 votes

4 answers

72k views

How do I choose between Tesseract and OpenCV? [closed]

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service. I tried using Tesseract ...

Legend

115k

asked Jul 15, 2012 at 6:07

91 votes

7 answers

121k views

Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.

Danilo Bargen

19k

asked Mar 2, 2010 at 13:47

75 votes

4 answers

197k views

Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often ...

Niall Oswald

865

asked Jun 18, 2017 at 20:07

75 votes

1 answer

3k views

How to get Indexing Service and MODI to produce Full-text over OCR?

I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) ...

Ishmaeel

14.3k

asked Aug 5, 2008 at 23:16

72 votes

11 answers

106k views

How to recognize vehicle license / number plate (ANPR) from an image? [closed]

I have a web site that allows users to upload images of cars and I would like to put a privacy filter in place to detect registration plates on the vehicle and blur them. The blurring is not a ...

Ryan O'Neill

5,525

asked Jun 11, 2009 at 14:18

72 votes

10 answers

143k views

How to make tesseract to recognize only numbers, when they are mixed with letters?

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789") for every symbol ...

zkunov

3,402

asked Feb 9, 2011 at 12:29

71 votes

1 answer

122k views

best OCR (Optical character recognition) example in android [closed]

I want a running example of OCR in android, I have done some research and find an example that implements OCR in android. https://github.com/rmtheis/tess-two and in it there are three projects files.....

Komal

739

asked Oct 23, 2013 at 5:12

64 votes

5 answers

152k views

How to implement and do OCR in a C# project?

I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for ...

Berker Yüceer

7,145

asked Jun 8, 2012 at 10:46

63 votes

8 answers

142k views

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api = ...

Abtin Rasoulian

889

asked Dec 30, 2013 at 0:15

59 votes

8 answers

53k views

What are good algorithms for vehicle license plate detection? [closed]

Background For my final project at university, I'm developing a vehicle license plate detection application. I consider myself an intermediate programmer, however my mathematics knowledge lacks ...

Ash

3,532

asked Jan 16, 2011 at 19:40

58 votes

8 answers

27k views

Converting a Vision VNTextObservation to a String

I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages: 1) class VNDetectTextRectanglesRequest 2) class VNTextObservation ...

Adrian

16.4k

asked Jun 13, 2017 at 23:34

57 votes

2 answers

40k views

How can I implement OCR on a website using PHP? [closed]

Are there any free OCR libraries that work with PHP or Python on a Linux server? The idea is to be able to upload an image and pull out characters from it, or allow users to "draw characters", and ...

Moshe

57.9k

asked Jan 31, 2010 at 2:10

53 votes

7 answers

171k views

Use pytesseract OCR to recognize text from an image

I need to use Pytesseract to extract text from this picture: and the code: from PIL import Image, ImageEnhance, ImageFilter import pytesseract path = 'pic.gif' img = Image.open(path) img = img....

Smith John

1,065

asked Jun 10, 2016 at 10:08

53 votes

10 answers

48k views

OCR lib for math formulas

I need an open OCR library which is able to scan complex printed math formulas (for example some formulas which were generated via LaTeX). I want to get some LaTeX-like output (or just some AST-like ...

Albert

66.7k

asked Aug 25, 2010 at 21:08

52 votes

6 answers

17k views

How to get the word under the cursor in Windows?

I want to create a application which gets the word under the cursor (not only for text fields), but I can't find how to do that. Using OCR is pretty hard. The only thing I've seen working is the ...

blez

4,999

asked Jan 12, 2011 at 3:18

50 votes

1 answer

55k views

Using Tesseract for handwriting recognition

I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form. I know you can train it to recognise your own ...

Jackdaw

693

asked Sep 18, 2016 at 10:05

47 votes

2 answers

92k views

Detect text area in an image using python and opencv

I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it. Like shown in the example image below. I am new to image processing so any idea how to ...

User9412

491

asked Jun 12, 2016 at 6:12

46 votes

2 answers

6k views

Set Tesseract font for OCR

I would like to use tesseract for serial number recognition, where I only want to recognize single characters, no word, no dictionary. Therefore I would like to use one of the already trained ...

Mr.Sheep

1,378

asked Jul 14, 2015 at 6:45

44 votes

2 answers

26k views

Split text lines in scanned document

I am trying to find a way to break the split the lines of text in a scanned document that has been adaptive thresholded. Right now, I am storing the pixel values of the document as unsigned ints from ...

Alex

4,166

asked Jan 24, 2016 at 20:36

44 votes

3 answers

22k views

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python. Here is the code I used: import numpy as np import cv2 from skimage.transform ...

singrium

2,896

asked Apr 12, 2019 at 14:41

42 votes

4 answers

80k views

Android OCR Library [closed]

Does anyone know any available libraries or sample codes that can be used to develop an app that reads the text in an image captured by the camera? Something similar to Google Goggles but only for ...

Noah

467

asked Jan 29, 2011 at 9:38

41 votes

9 answers

80k views

Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?

I am capturing image using SurfaceView and getting Yuv Raw preview data in public void onPreviewFrame4(byte[] data, Camera camera) I have to perform some image preprocessing in onPreviewFrame so i ...

Hitesh Patel

2,868

asked Feb 17, 2012 at 9:36

41 votes

7 answers

4k views

Extracting code from photograph of T-shirt via OCR

I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code: Next I tried to extract the code from the image via OCR, so I installed ...

BioGeek

22.4k

asked Mar 10, 2010 at 16:43

41 votes

4 answers

68k views

What kind of OCR Java library should I use in Android? [closed]

I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it . What Java library should I use?

systempuntoout

73k

asked Jun 30, 2009 at 9:02

40 votes

6 answers

39k views

How do I segment a document using Tesseract then output the resulting bounding boxes and labels

I'm trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this 'out of the box' because of the results ...

James Owers

8,118

asked Feb 18, 2015 at 18:27

37 votes

9 answers

50k views

What is the ideal font for OCR?

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E'n different fonts, but this seems pretty ...

Chris Lloyd

12.2k

asked Nov 25, 2008 at 1:06

37 votes

4 answers

64k views

Character recognition (OCR algorithm) [closed]

I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image. Steps ...

TLE

705

asked Mar 3, 2013 at 16:58

37 votes

1 answer

2k views

Using Microsoft OCR Library with JS/jQuery in VS 2013

I am currently working on a windows 8.1 application and I am using web languages and mostly jQuery (Cordova type project) as it might be used on other platforms. I need to use the Microsoft OCR ...

ColonelMoumou

371

asked Apr 15, 2015 at 13:33

36 votes

6 answers

103k views

Preprocessing image for Tesseract OCR with OpenCV

I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and ...

Mauricio

839

asked Mar 9, 2015 at 5:57

35 votes

5 answers

49k views

How to install language in tesseract OCR

I have installed tesseract OCR and it has only 'eng' and 'osd' in the language list. I need german language. I tired following command brew install tesseract-ocr-deu but i am getting error. Error: ...

Lama Madan

687

asked Oct 19, 2018 at 11:34

35 votes

6 answers

11k views

Recognize a number from an image

I'm trying to write an application to find the numbers inside an image and add them up. How can I identify the written number in an image? There are many boxes in the image I need to get the ...

Hash

7,894

asked Apr 20, 2015 at 10:45

35 votes

9 answers

154k views

Tesseract OCR simple example

Hi Can you anyone give me a simple example of testing Tesseract OCR preferably in C#. I tried the demo found here. I download the English dataset and unzipped in C drive. and modified the code as ...

Will Robinson

651

asked May 16, 2013 at 22:14

34 votes

6 answers

79k views

Using Tesseract from java

I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++...

Omnipresent

29.8k

asked Dec 20, 2012 at 14:45

34 votes

3 answers

4k views

Is there an efficient algorithm for segmentation of handwritten text?

I want to automatically divide an image of ancient handwritten text by lines (and by words in future). The first obvious part is preprocessing the image... I'm just using a simple digitization (...

Ernado

641

asked Nov 4, 2011 at 19:55

33 votes

5 answers

60k views

OCR with the Tesseract interface

How do you OCR an tiff file using Tesseract's interface in c#? Currently I only know how to do it using the executable.

toh yen cheng

355

asked Aug 27, 2008 at 14:46

33 votes

8 answers

30k views

Is there an OCR library that outputs coordinates of words found within an image? [closed]

In my experience, OCR libraries tend to merely output the text found within an image but not where the text was found. Is there an OCR library that outputs both the words found within an image as well ...

Adam Paynter

46.5k

asked Feb 18, 2011 at 12:01

32 votes

7 answers

40k views

How to remove all lines and borders in an image while keeping text programmatically?

I'm trying to extract text from an image using Tesseract OCR. Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders ...

wind

423

asked Nov 27, 2015 at 3:26

32 votes

2 answers

40k views

How can I run tesseract with multiple languages one time?

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if I run tesseract with japanese (-l ...

pars

429

asked Jun 24, 2014 at 6:31

32 votes

5 answers

80k views

Tesseract ocr PDF as input

I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a ...

acrab

359

asked Apr 15, 2015 at 17:48

32 votes

2 answers

32k views

Which OCR Engine is better: Tesseract or OCRopus? [closed]

I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: ...

Ahmed Hussein

442

asked Apr 5, 2012 at 17:08

31 votes

10 answers

62k views

Programmatically recognize text from scans in a PDF File [closed]

I have a PDF file, which contains data that we need to import into a database. The files seem to be pdf scans of printed alphanumeric text. Looks like 10 pt. Times New Roman. Are there any tools ...

Rob

3,026

asked Oct 1, 2008 at 16:23

31 votes

3 answers

56k views

Tesseract training for a new font

I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which ...

user19235

601

asked Dec 23, 2016 at 5:13

31 votes

8 answers

53k views

How to know if a PDF contains only images or has been OCR scanned for searching?

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the ...

Bratch

4,153

asked Sep 28, 2009 at 22:45

30 votes

2 answers

15k views

What OCR options exist beyond Tesseract? [closed]

I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage)...

ylluminate

12.2k

asked Mar 13, 2012 at 19:31

Collectives™ on Stack Overflow

Questions tagged [ocr]

Related Tags