51
$\begingroup$

I've got lots of images of diatoms. I need to get a list of their equivalent radii.

some diatoms

more diatoms

It's not crucial to get them all, as long as those missed don't mess with the radius distribution (as would happen if it always missed the biggest ones or the smallest ones).

I am having trouble mainly because sometimes two little beings end up overlapping, in which case I would need to either separate them or at least discard them. Also, even though some may look like hollow, or even split in two, they are actually one. So

images = Import /@ {"https://i.sstatic.net/y7vag.jpg", 
    "https://i.sstatic.net/XQJlW.jpg", 
    "https://i.sstatic.net/gGTro.jpg"};

preproc[im_] := 
 ColorConvert[im, "GrayScale"] // Binarize // ColorNegate // 
    FillingTransform // MorphologicalComponents // Colorize

FlipView[{#, preproc @ #}] & /@ images // TabView

diatoms after image processing

An example of an image with the beings approximately rounded manually can be seen here:

measuring the diatoms

I would appreciate any ideas. Thanks!

Note: a semi-automatic solution can still save a lot of time

$\endgroup$
15
  • 5
    $\begingroup$ OT: I encountered a similar problem years ago. When I was trying to figure out, my boss got tired and bought a 15,000USD software for it..(Before that, we count them manually.) $\endgroup$
    – Silvia
    Commented May 13, 2013 at 17:41
  • 3
    $\begingroup$ Try with an easier organism first $\endgroup$ Commented May 13, 2013 at 17:43
  • 6
    $\begingroup$ @Silvia, if you give me 15000U$D I'll accept your answer :) $\endgroup$
    – Rojo
    Commented May 13, 2013 at 17:44
  • 4
    $\begingroup$ btw I took an image processing course on coursera 3 month ago, and I think some technic there you might found helpful. I would highly recommend the Active Contours method on week 5 (also ref), and all Geometric PDEs methods on week 6. $\endgroup$
    – Silvia
    Commented May 13, 2013 at 19:09
  • 2
    $\begingroup$ @Silvia I've signed up for the course :P. Hope I have the time $\endgroup$
    – Rojo
    Commented May 14, 2013 at 18:25

1 Answer 1

59
+250
$\begingroup$

EDIT: I've found a bug in the code; Now the result looks a lot better, too.

This isn't perfect, but it's a start.

The first step is to find the "chain elements" (what are those anyway? I'm guessing cells?)

The chain elements have a distinctive scale, so I can filter them out easily using a median filter:

enter image description here

Subtracting the median filtered image from the original removes the background:

img = Import["https://i.sstatic.net/gGTro.jpg"];
(* I apply the median filter to a downsampled image - the result is almost the 
   same, but it's much much faster this way *)
diff = ImageAdjust[
   ImageDifference[
    ImageResize[MedianFilter[ImageResize[img, 250], 10], 
     ImageDimensions[img][[1]]], img]]

enter image description here

I don't want to work with RGB images, so I simply take the mean of the three channels. (ColorConvert[..., "Grayscale"] doesn't make much sense here, since these aren't "natural" colors, where the green channel contains most of the brightness).

meanImg = Total[ImageData[diff], {-1}]/3;

To find the centers of the chain links, I use a laplacian of gaussian filter, with a filter size that's approximately as large as the chain links themselves.

σ = 25;
log = Image[LaplacianGaussianFilter[meanImg, σ]*-σ^2]

enter image description here

I'm looking for local maxima in this image that are brighter than some threshold:

maxima = ImageMultiply[MaxDetect[log], Binarize[log, 0.5]];    
pts = ComponentMeasurements[maxima, "Centroid"][[All, 2]];

This finds the maxima, but the list contains some "duplicates", where the same object contains multiple maxima in the LoG image. I can remove those relatively easily by using Mathematica's MeanShift function to "cluster" points that are close together, then removing duplicate points:

pts = Union[MeanShift[pts, 25, MaxIterations -> 10]];

In addition to the chain element centers, I'd like to have an estimate of the direction at each point. I can get a good estimate using calculus (I calculate the angle of the major eigenvector of the Hessian matrix at each point):

Clear[gaussianDerivative];
Do[gaussianDerivative[i, 2 - i] = 
   GaussianFilter[meanImg, 2 σ, {i, 2 - i}], {i, 0, 2}];

angles = Image[
   ArcTan @@ 
     Eigenvectors[{{m[0, 2], m[1, 1]}, {m[1, 1], m[2, 0]}}][[1]] /. 
    m -> gaussianDerivative];

getDirectionAt = 
  Function[pt, 
   With[{α = π/2 - ImageValue[angles, pt]}, {Cos[α],
      Sin[α]}]];
Show[img, 
 Graphics[{Red, Thick, 
   Line[{# - 15 getDirectionAt[#], # + 15 getDirectionAt[#]}] & /@ 
    pts}]]

enter image description here

Now the idea is to :

  • pick a point at random
  • pick it's nearest neighbor
  • estimate where the next point in the chain would be - if there's a point close to that location, add it to the list
  • rinse and repeat until no more points are found

This is the code:

nf = Nearest[pts];
continueChain = Function[chain,
   Module[{expectedNextPoint, nearestNextPoint, direction},
    (
     direction = chain[[-1]] - chain[[-2]];
     expectedNextPoint = chain[[-1]] + direction;
     nearestNextPoint = nf[expectedNextPoint][[1]];
     If[
      Norm[nearestNextPoint - expectedNextPoint] < 
        Norm[direction]*0.75 &&
       Abs[
         getDirectionAt[chain[[-1]]].getDirectionAt[
           nearestNextPoint]] > 0.5 &&
       Norm[direction] < 100
      , Append[chain, nearestNextPoint], chain]
     )]];

findRandomChain[] := Module[{chain},
  (
   chain = nf[RandomChoice[pts], 2];
   chain = FixedPoint[continueChain, chain];
   chain = FixedPoint[continueChain, Reverse[chain]]
   )]

This function finds one random chain. If I call it often enough, I'll (almost always) get all of the potential chains:

allChains = 
  SortBy[Union[Table[findRandomChain[], {1000}]], -Length[#] &];

The only problem is that some of those chains "overlap", i.e. some points are members of more than one chain. I think a good solution to this problem would be to apply a mixture of gaussians/expectation maximization algorithm to decide which point belongs to which chain. But it's getting late here, so I'll just go with a simple greedy algorithm:

  • Start with the longest chain
  • remove the points from the chain that are already used in some other chain
  • If the chain still contains at least two or more points, keep it
  • Remove all points in this chain from the set of "unused" points
  • repeat for all chains

This is it:

removeChainPoints = Function[{unusedPoints, chain},
   With[{modifiedChain = Select[chain, MemberQ[unusedPoints, #] &]},
    (
     If[Length[modifiedChain] >= 2, Sow[modifiedChain]];
     Complement[unusedPoints, chain]
     )]];
noOverlap = Reap[Fold[removeChainPoints, pts, allChains]][[2, 1]];

Now, noOverlap contains a set of non-overlapping chains:

totalChainLength = Total[Norm /@ Differences[#]] &;
colorFn = ColorData[3];
Show[
 ColorConvert[img, "Grayscale"],
 Graphics[
  MapThread[
   Module[{direction, normal},
     (
      direction = #1[[1]] - #1[[-1]];
      direction = direction/Norm[direction];
      normal = {-direction[[2]], direction[[1]]};
      {
       colorFn[#2],
       Thick,
       Line[#1],
       Dotted,
       Line[{#1[[1]], #1[[1]] + 50*normal, #1[[-1]] + 
          50*normal, #1[[-1]]}],
       Text[totalChainLength[#1], Mean[#1] + 50*normal]
       }
      )] &,
   {noOverlap, Range[Length[noOverlap]]}]]]

enter image description here

I didn't spend much time optimizing the thresholds in continueChain, so you could try to improve those. But you could probably come up with a smarter algorithm than simply adding one nearest point after the other. There's probably a well-known graph algorithm that will find globally optimal solutions in no time. (FindCurvePath almost does what I want, but I couldn't find a way to tweak the algorithm, so it groups far too many points into one curve.)

This is the result for the other image:

enter image description here

$\endgroup$
8
  • 1
    $\begingroup$ Wow, fantastic! $\endgroup$
    – rm -rf
    Commented May 14, 2013 at 21:21
  • 6
    $\begingroup$ Many biologists thank you! Superb! $\endgroup$
    – Todd Allen
    Commented May 15, 2013 at 2:16
  • $\begingroup$ "The first step is to find the "chain elements" (what are those anyway? I'm guessing cells?)" - yep, those are diatom cells. That particular species just aggregates that way. $\endgroup$ Commented May 15, 2013 at 4:42
  • 1
    $\begingroup$ @nikie you know, i'd pay good money for an image processing book if you wrote it :) $\endgroup$
    – amr
    Commented May 18, 2013 at 2:22
  • $\begingroup$ Thanks a lot nikie for this answer. It helps, and most importantly is very instructive as all of your image processing answers are. I am still struggling to detect the borders of the cells and their areas, but I'm enjoying testing some ideas that more or less work. $\endgroup$
    – Rojo
    Commented May 19, 2013 at 5:38

Not the answer you're looking for? Browse other questions tagged or ask your own question.