186
$\begingroup$

Over on the TeX stackexchange, we have been discussing how to detect "rivers" in paragraphs in this question.

In this context, rivers are bands of white space that result from accidental alignment of interword spaces in the text. Since this can be quite distracting to a reader bad rivers are considered to be a symptom of poor typography. An example of text with rivers is this one, where there are two rivers flowing diagonally.

enter image description here

There is interest in detecting these rivers automatically, so that they can be avoided (probably by manual editing of the text). Raphink is making some progress at the TeX level (which only knows of glyph positions and bounding boxes), but I feel confident that the best way to detect rivers is with some image processing (since glyph shapes are very important and not available to TeX). I have tried various ways to extract the rivers from the above image, but my simple idea of applying a small amount of ellipsoidal blurring doesn't seem to be good enough. I also tried some Radon Hough transform based filtering, but I didn't get anywhere with those either. The rivers are very visible to the feature-detection circuits of the human eye/retina/brain and somehow I would think this could be translated to some kind of filtering operation, but I am not able to make it work. Any ideas?

To be specific, I'm looking for some operation that will detect the 2 rivers in the above image, but not have too many other false positive detections.

EDIT: endolith asked why I am pursuing a image-processing-based approach given that in TeX we have access to the glyph positions, spacings, etc, and it might be much faster and more reliable to use an algorithm that examines the actual text. My reason for doing things the other way is that the shape of the glyphs can affect how noticeable a river is, and at the text level it is very difficult to consider this shape (which depends on the font, on ligaturing, etc). For an example of how the shape of the glyphs can be important, consider the following two examples, where the difference between them is that I have replaced a few glyphs with others of almost the same width, so that a text-based analysis would consider them equally good/bad. Note, however, that the rivers in the first example are much worse than in the second.

enter image description here

enter image description here

$\endgroup$
10
  • 5
    $\begingroup$ +1 I like this question. My first thought is a Hough Transform, but it would probably need some pre-processing. Maybe a Dilation Filter first. $\endgroup$
    – datageist
    Commented Oct 5, 2011 at 18:11
  • $\begingroup$ I'm surprised Radon transform didn't work, actually. How did you do it? $\endgroup$
    – endolith
    Commented Oct 5, 2011 at 19:49
  • $\begingroup$ @endolith: Nothing sophisticated. I used ImageLines[] from Mathematica, with and without some preprocessing. I guess this is technically using a Hough rather than Radon transform. I won't be surprised if the proper preprocessing (I didn't try datageist's suggested dilation filter) and/or parameter settings can make this work. $\endgroup$
    – Lev Bishop
    Commented Oct 5, 2011 at 20:00
  • $\begingroup$ Google Image Search for rivers shows "winding" rivers, too. Do you want to find those? cdn.ilovetypography.com/img/text-river1.gif $\endgroup$
    – endolith
    Commented Oct 5, 2011 at 21:56
  • $\begingroup$ @endolith I guess I ultimately want to replicate the processing of the human visual system that makes certain configurations of spaces distracting. Since this can happen also for meandering rivers, then I would like to catch those, although the straight ones seem to be more of a problem in general. Even better would be a way to quantify the "badness" of rivers in a way that corresponds to how strongly visible they are when reading the text. But that is all very subjective and hard to quantify. In the first place, simply catching really all bad rivers without too many false positives will do. $\endgroup$
    – Lev Bishop
    Commented Oct 6, 2011 at 2:27

5 Answers 5

139
$\begingroup$

I have thought about this some more, and think that the following should be fairly stable. Note that I have limited myself to morphological operations, because these should be available in any standard image processing library.

(1) Open image with a nPix-by-1 mask, where nPix is about the vertical distance between letters

#% read image
img = rgb2gray('https://i.sstatic.net/4ShOW.png');

%# threshold and open with a rectangle
%# that is roughly letter sized
bwImg = img > 200; %# threshold of 200 is better than 128

opImg = imopen(bwImg,ones(13,1));

enter image description here

(2) Open image with a 1-by-mPix mask to eliminate whatever is too narrow to be a river.

opImg = imopen(opImg,ones(1,5));

enter image description here

(3) Remove horizontal "rivers and lakes" that are due to space between paragraphs, or indentation. For this, we remove all rows that are all true, and open with the nPix-by-1 mask that we know will not affect the rivers we have found previously.

To remove lakes, we can use an opening mask that is slightly larger than nPix-by-nPix.

At this step, we can also throw out everything that is too small to be a real river, i.e. everything that covers less area than (nPix+2)*(mPix+2)*4 (that will give us ~3 lines). The +2 is there because we know that all objects are at least nPix in height, and mPix in width, and we want to go a little above that.

%# horizontal river: just look for rows that are all true
opImg(all(opImg,2),:) = false;
%# open with line spacing (nPix)
opImg = imopen(opImg,ones(13,1));

%# remove lakes with nPix+2
opImg = opImg & ~imopen(opImg,ones(15,15)); 

%# remove small fry
opImg = bwareaopen(opImg,7*15*4);

enter image description here

(4) If we're interested in not only the length, but also the width of the river, we can combine distance transform with skeleton.

   dt = bwdist(~opImg);
   sk = bwmorph(opImg,'skel',inf);
   %# prune the skeleton a bit to remove branches
   sk = bwmorph(sk,'spur',7);

   riversWithWidth = dt.*sk;

enter image description here (colors correspond to width of river (though color bar is off by a factor of 2)

Now you can get the approximate length of the rivers by counting the number of pixels in each connected component, and the average width by averaging their pixel values.


Here's the exact same analysis applied to the second, "no-river" image:

enter image description here

$\endgroup$
8
  • $\begingroup$ Thanks. I have Matlab so I'll try this out on some other texts to see how robust it will be. $\endgroup$
    – Lev Bishop
    Commented Oct 5, 2011 at 20:08
  • $\begingroup$ To integrate it back into TeX might be another problem, unless we can port that to Lua somehow. $\endgroup$
    – raphink
    Commented Oct 5, 2011 at 20:32
  • $\begingroup$ @LevBishop: I think I understand the issue a bit better. The new solution should be fairly robust. $\endgroup$
    – Jonas
    Commented Oct 6, 2011 at 2:09
  • $\begingroup$ @levBishop: One more update. $\endgroup$
    – Jonas
    Commented Oct 6, 2011 at 12:56
  • 1
    $\begingroup$ @LevBishop: Just noticed the second image. Turns out the morphology-based analysis does its job. $\endgroup$
    – Jonas
    Commented Oct 6, 2011 at 13:16
58
$\begingroup$

In Mathematica, using erosion and Hough transform:

(*Get Your Images*)
i = Import /@ {"https://i.sstatic.net/4ShOW.png", 
               "https://i.sstatic.net/5UQwb.png"};

(*Erode and binarize*)
i1 = Binarize /@ (Erosion[#, 2] & /@ i);

(*Hough transform*)
lines = ImageLines[#, .5, "Segmented" -> True] & /@ i1;

(*Ready, show them*)
Show[#[[1]],Graphics[{Thick,Orange, Line /@ #[[2]]}]] & /@ Transpose[{i, lines}]

enter image description here

Edit Answering Mr. Wizard's comment

If you want to get rid of the horizontal lines, just do something like this instead (probably someone could make it simpler):

Show[#[[1]], Graphics[{Thick, Orange, Line /@ #[[2]]}]] & /@ 
 Transpose[{i, Select[Flatten[#, 1], Chop@Last@(Subtract @@ #) != 0 &] & /@ lines}]

enter image description here

$\endgroup$
13
  • 1
    $\begingroup$ Why not get rid of all the horizontal lines? ( +1 ) $\endgroup$
    – Mr.Wizard
    Commented Nov 22, 2011 at 8:25
  • $\begingroup$ @Mr. Just to show all the lines are being detected ... $\endgroup$ Commented Nov 22, 2011 at 12:16
  • 1
    $\begingroup$ That is not part of the problem however, is it? $\endgroup$
    – Mr.Wizard
    Commented Nov 22, 2011 at 12:17
  • $\begingroup$ @Mr. Edited as requested $\endgroup$ Commented Nov 22, 2011 at 12:54
  • 4
    $\begingroup$ @belisarius The coordinate system used in the Hough transform changed after 8.0.0 to match the one of the Radon transform. This in turn has changed the behavior of ImageLines. Overall this is an improvement, though in this case one would prefer the previous behavior. If you don't want to experiment with peak detections, you can change the aspect ratio of the input image to be closer to 1 and obtain a result similar to 8.0.0: lines = ImageLines[ImageResize[#, {300, 300}], .6, "Segmented" -> True] & /@ i1;. All that being said, for this problem a morphological approach seems more robust. $\endgroup$ Commented Nov 28, 2011 at 15:06
30
$\begingroup$

Hmmm... I guess Radon transform isn't that easy to extract from. (Radon transform basically rotates the image while "looking through it" edge-on. It's the principle behind CAT scans.) The transform of your image produces this sinogram, with the "rivers" forming bright peaks, which are circled:

enter image description here

The one at 70 degrees rotation can be seen pretty clearly as the peak on the left of this plot of a slice along the horizontal axis:

enter image description here

Especially if the text was Gaussian blurred first:

enter image description here

But I'm not sure how to reliably extract these peaks from the rest of the noise. The bright top and bottom ends of the sinogram represent the "rivers" between horizontal lines of text, which you obviously don't care about. Maybe a weighting function vs angle that emphasizes more vertical lines and minimizes the horizontal ones?

A simple cosine weighting function works well on this image:

enter image description here

finding the vertical river at 90 degrees, which is the global maxima in the sinogram:

enter image description here

and on this image finding the one at 104 degrees, though blurring first makes it more accurate:

enter image description here enter image description here

(SciPy's radon() function is kind of dumb, or I'd map this peak back onto the original image as a line going through the middle of the river.)

But it doesn't find either of the two main peaks in the sinogram for your image, after blurring and weighting:

enter image description here

They're there, but they're overwhelmed by the stuff near the middle peak of the weighting function. With the right weighting and tweaking this method probably could work, but I'm not sure what the right tweaks are. It probably depends on the properties of the scans of the page, too. Maybe the weighting needs to be derived from the overall energy in the slice or something, like a normalization.

from pylab import *
from scipy.misc import radon
import Image

filename = 'rivers.png'
I = asarray(Image.open(filename).convert('L').rotate(90))

# Do the radon transform and display the result
a = radon(I, theta = mgrid[0:180])

# Remove offset
a = a - min(a.flat)

# Weight it to emphasize vertical lines
b = arange(shape(a)[1]) #
d = (0.5-0.5*cos(b*pi/90))*a

figure()
imshow(d.T)
gray()
show()

# Find the global maximum, plot it, print it
peak_x, peak_y = unravel_index(argmax(d),shape(d))
plot(peak_x, peak_y,'ro')
print len(d)- peak_x, 'pixels', peak_y, 'degrees'
$\endgroup$
3
  • $\begingroup$ What if you were to blur with an asymmetric Gaussian first? I.e. narrow in the horizontal direction, wide in the vertical direction. $\endgroup$
    – Jonas
    Commented Oct 11, 2011 at 19:05
  • $\begingroup$ @Jonas: That would probably help. The main problem is automatically picking the peaks out of the background when the background varies so much with rotation. Asymmetric blurring could smooth out the horizontal stripes from line to line. $\endgroup$
    – endolith
    Commented Oct 11, 2011 at 19:20
  • $\begingroup$ This works well for detecting the rotation of lines in the text, at least: gist.github.com/endolith/334196bac1cac45a4893 $\endgroup$
    – endolith
    Commented Aug 1, 2014 at 20:02
18
$\begingroup$

I trained a discriminative classifier on the pixels using derivative features (up to 2nd order) on different scales.

My labels:

Labeling

Prediction on training image:

enter image description here

Prediction on the other two images:

enter image description here

enter image description here

I guess this looks promising and could yield usable results given more training data and maybe smarter features. On the other hand it took me only a few minutes to get these results. You may reproduce the results yourself by using the open source software ilastik . [Disclaimer: I'm one of the main developers.]

$\endgroup$
2
$\begingroup$

(Sorry, this post doesn't come with awesome demonstrations.)

If you wanted to work with information TeX already has (letters and positions), you can manually classify letters and letter pairs as "sloping" in one direction or another. For example, "w" has SW and SE corner slopes, the "al" combo has a NW corner slope, "k" has a NE corner slope. (Don't forget punctuation - a quote followed by a letter that fills the lower half of the glyph box establishes a nice slope; quote followed by q is particularly strong.)

Then, look for occurrences of corresponding slopes on opposite sides of a space - "w al" for a SW-to-NE river or "k T" for a NW-to-SE river. When you find one on a line, see if a similar one occurs, appropriately shifted left or right, on the lines above/below; when you find a run of these, there's probably a river.

Also, obviously, just look for spaces stacked nearly vertically, for the plain vertical rivers.

You can get a little more sophisticated by measuring the "strength" of the slope: how much of the advance box is "empty" due to the slope and thus contributing to the river's width. "w" is fairly small, as it has only a small corner of its advance box to contribute to the river, but "V" is very strong. "b" is slightly stronger than "k"; the gentler curve gives a more visually-continuous river edge, making it stronger and visually wider.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.