30

What is a fast and reliable way to threshold images with possible blurring and non-uniform brightness?

Example (blurring but uniform brightness):

enter image description here

Because the image is not guaranteed to have uniform brightness, it's not feasible to use a fixed threshold. An adaptive threshold works alright, but because of the blurriness it creates breaks and distortions in the features (here, the important features are the Sudoku digits):

enter image description here

I've also tried using Histogram Equalization (using OpenCV's equalizeHist function). It increases contrast without reducing differences in brightness.

The best solution I've found is to divide the image by its morphological closing (credit to this post) to make the brightness uniform, then renormalize, then use a fixed threshold (using Otsu's algorithm to pick the optimal threshold level):

enter image description here

Here is code for this in OpenCV for Android:

Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
Mat closed = new Mat(); // closed will have type CV_32F
Imgproc.morphologyEx(image, closed, Imgproc.MORPH_CLOSE, kernel);
Core.divide(image, closed, closed, 1, CvType.CV_32F);
Core.normalize(closed, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255, Imgproc.THRESH_BINARY_INV
    +Imgproc.THRESH_OTSU); 

This works great but the closing operation is very slow. Reducing the size of the structuring element increases speed but reduces accuracy.

Edit: based on DCS's suggestion I tried using a high-pass filter. I chose the Laplacian filter, but I would expect similar results with Sobel and Scharr filters. The filter picks up high-frequency noise in the areas which do not contain features, and suffers from similar distortion to the adaptive threshold due to blurring. it also takes about as long as the closing operation. Here is an example with a 15x15 filter:

enter image description here

Edit 2: Based on AruniRC's answer, I used Canny edge detection on the image with the suggested parameters:

double mean = Core.mean(image).val[0];
Imgproc.Canny(image, image, 0.66*mean, 1.33*mean);

I'm not sure how to reliably automatically fine-tune the parameters to get connected digits.

enter image description here

10
  • You could try to threshold on a high-pass filtered image, assuming that the brigthness change occurs in low frequencies. I don't know, however, how fast these filter operations are on a mobile device, and I think you would need a rather large kernel.
    – DCS
    Mar 22, 2013 at 8:42
  • @DCS Unfortunately, I don't think high-pass filters will work. See my edit to the above post.
    – 1''
    Mar 22, 2013 at 22:25
  • 2
    Since the features you are interested in cover several pixels, how about reducing the image to a lower resolution first? You could then go back and get more detail at the original resolution, using your lower-res version as a mask. Mar 24, 2013 at 3:49
  • 1
    Reduce the resolution and use the small image to determinate how to normilize the brigthness in the corresponding area in the big image, after normalization proceed to filter. It should have more noise than if you do the normalization with the big image, but it will be faster. Hopefully with the right threshold it will be enough. It's Just an idea.
    – Theraot
    Mar 24, 2013 at 8:32
  • 1
    Isn't this question just a matter of doing a Google search and benchmarking a few techniques? Mar 25, 2013 at 2:26

5 Answers 5

23

Using Vaughn Cato and Theraot's suggestions, I scaled down the image before closing it, then scaled the closed image up to regular size. I also reduced the kernel size proportionately.

Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(5,5));
Mat temp = new Mat(); 

Imgproc.resize(image, temp, new Size(image.cols()/4, image.rows()/4));
Imgproc.morphologyEx(temp, temp, Imgproc.MORPH_CLOSE, kernel);
Imgproc.resize(temp, temp, new Size(image.cols(), image.rows()));

Core.divide(image, temp, temp, 1, CvType.CV_32F); // temp will now have type CV_32F
Core.normalize(temp, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);

Imgproc.threshold(image, image, -1, 255, 
    Imgproc.THRESH_BINARY_INV+Imgproc.THRESH_OTSU);

The image below shows the results side-by-side for 3 different methods:

Left - regular size closing (432 pixels), size 19 kernel

Middle - half-size closing (216 pixels), size 9 kernel

Right - quarter-size closing (108 pixels), size 5 kernel

enter image description here

The image quality deteriorates as the size of the image used for closing gets smaller, but the deterioration isn't significant enough to affect feature recognition algorithms. The speed increases slightly more than 16-fold for the quarter-size closing, even with the resizing, which suggests that closing time is roughly proportional to the number of pixels in the image.

Any suggestions on how to further improve upon this idea (either by further reducing the speed, or reducing the deterioration in image quality) are very welcome.

3
  • You should go with adaptiveThreshold instead of threshold. Adaptive threshold will provide better results in case of dark image.
    – AnkitRox
    Feb 17, 2015 at 6:24
  • @AnkitRox I discuss adaptive thresholding in the question.
    – 1''
    Feb 17, 2015 at 15:46
  • @1'': I was wondering , can it be done on live camera frame? I want to recognize character from live camera frame? all the suggestions are welcomed. Mar 16, 2015 at 6:44
2

Alternative approach:

Assuming your intention is to have the numerals to be clearly binarized ... shift your focus to components instead of the whole image.

Here's a pretty easy approach:

  1. Do a Canny edgemap on the image. First try it with parameters to Canny function in the range of the low threshold to 0.66*[mean value] and the high threshold to 1.33*[mean value]. (meaning the mean of the greylevel values).
  2. You would need to fiddle with the parameters a bit to get an image where the major components/numerals are visible clearly as separate components. Near perfect would be good enough at this stage.
  3. Considering each Canny edge as a connected component (i.e. use the cvFindContours() or its C++ counterpart, whichever) one can estimate the foreground and background greylevels and reach a threshold.

    For the last bit, do take a look at sections 2. and 3. of this paper. Skipping most of the non-essential theoretical parts it shouldn't be too difficult to have it implemented in OpenCV.

    Hope this helped!

Edit 1:

Based on the Canny edge thresholds here's a very rough idea just sufficient to fine-tune the values. The high_threshold controls how strong an edge must be before it is detected. Basically, an edge must have gradient magnitude greater than high_threshold to be detected in the first place. So this does the initial detection of edges.

Now, the low_threshold deals with connecting nearby edges. It controls how much nearby disconnected edges will get combined together into a single edge. For a better idea, read "Step 6" of this webpage. Try setting a very small low_threshold and see how things come about. You could discard that 0.66*[mean value] thing if it doesn't work on these images - its just a rule of thumb anyway.

3
  • hmm. okay, this playing around with thresholding would take time, but its not impossible to get to a set of values that will give decent results on a large number of images. Try low_threshold = 50, high_threshold = 150. generally low_threshold : high_threshold should be around 1:3 as per the original paper by Canny. fiddle around! :)
    – AruniRC
    Mar 25, 2013 at 6:47
  • Sorry, I should be more exact. I'm concerned about reliability because the method fails if Canny creates even a small breakage in a digit. On the other hand, if my closing method doesn't work well, the digits are malformed but can still be detected by computer. If you think I can get reliable results with a fixed Canny threshold over a range of lighting conditions and blur, I'll give it a try. The results from the paper do seem fairly impressive.
    – 1''
    Mar 25, 2013 at 15:29
  • yeah the reliability is a big question. though many papers get quite nice results using Canny edgemaps as an initial step, I am yet to come across any that specifically mention the arguments/paramteres used. so yeah, the reliability under changing circumstances would turn out to be a major problem. your self-answer seems to be working quite well anyway!
    – AruniRC
    Mar 26, 2013 at 7:29
2

We use Bradleys algorithm for very similar problem (to segment letters from background, with uneven light and uneven background color), described here: http://people.scs.carleton.ca:8008/~roth/iit-publications-iti/docs/gerh-50002.pdf, C# code here: http://code.google.com/p/aforge/source/browse/trunk/Sources/Imaging/Filters/Adaptive+Binarization/BradleyLocalThresholding.cs?r=1360. It works on integral image, which can be calculated using integral function of OpenCV. It is very reliable and fast, but itself is not implemented in OpenCV, but is easy to port.

Another option is adaptiveThreshold method in openCV, but we did not give it a try: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold. The MEAN version is the same as bradleys, except that it uses a constant to modify the mean value instead of a percentage, which I think is better.

Also, good article is here: https://dsp.stackexchange.com/a/2504

0

You could try working on a per-tile basis if you know you have a good crop of the grid. Working on 9 subimages rather than the whole pic will most likely lead to more uniform brightness on each subimage. If your cropping is perfect you could even try going for each digit cell individually; but it all depends on how reliable is your crop.

2
  • cool; if you have perfect crops of each cell and you can isolate each digit easily, then maybe template matching could work to a certain degree... there's only 10 possible contents on those cells. I feel this could work well with good training; Do you expect all input to use same the typeface?
    – amadillu
    Mar 31, 2013 at 12:07
  • Not necessarily. I use Histogram of Oriented Gradients to isolate important features of each digit into a "feature vector", then a Support Vector Machine to classify the vector. I'm told this is the most reliable way to do digit recognition.
    – 1''
    Mar 31, 2013 at 15:57
0

Ellipse shape is complex to calculate if compared to a flat shape. Try to change:

Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));

to:

Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(19,19));

can speed up your enough solution with low impact to accuracy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.