Convex hulls and brightfield background


The convex hull of a set of points is the convex polygon of smallest area that encloses all of the points. We use the convex hull for things like finding the edge of a Petri dish imaged against a dark background and as a precursor to finding the minimum enclosing circle for our Zernike feature calculation. Carolina Wahlby came up with a very good way of calculating the variance in intensity for worms in a well, illuminated in brightfield:
Brightfield image of worms
Brightfield image of worms
The worms are dark against a light background which falls off monotonically from a center of brightness. If you threshold the image at a given intensity, the worms will appear as black against a circle of the points in the background at or above that intensity. If you then apply the convex hull transform to the binary image, you erase the worms. You can stack these binary images up, taking one at each of a range of intensities and the worms disappear. Typically brilliant work from Carolina.
Image after convex hull Image corrected by dividing by convex hull
Image after applying convex hull transform Image Corrected by dividing by convex hull
OK, brilliant idea, but it takes a lot of time to do the convex hull. For an 8-bit image, you have to do 255 of them. So, my job, code in Python / Numpy and make it run fast. The two files in question are cpmorphology.py and filter.py. Tricks: So, step 1 was making convex_hull_ijv. This takes a list of i,j coordinates coupled with "labels" which are the different intensity levels. This is hairy code, best to not look at it too hard. But once it's done, you can use it like a hammer - it's a great tool.
Step 1 1/2 is to quantize the image into however many intensity levels you want... standard stuff. Step 2 is to make that i,j,v list of points. We only want the points that touch those of lower value: that's a morphological grayscale erosion. Any point that has the same value in the original and eroded image is in the interior of the convex hull and can be ignored. All the other points need to be considered in the convex hulls of all intensity levels from 1+eroded to the value of the original image. The code in cellprofiler.cpmath.filter.convex_hull_transform is pretty heavily commented and there are some fun tricks there that demonstrate how to create the i,j,v list out of the eroded and original image.
Step 3 is to make a new image and paint each of the convex hulls on it using each hull's intensity as the "ink" of the line. You want the most intense ink where the lines cross, so there's a step that takes all of the points drawn and selects the highest intensity for each.
Step 4 is to scan across the image from top to bottom. Start with an intensity of zero and, when you cross a line, set the intensity to that line's value. If a point has no value, set its intensity to the current value. Do the same from bottom to top and take the minimum of the two images: you end up filling in all of the polygons correctly. I use Numpy's cumsum creatively to do the scanning - again look at the code, no loops.

Faster and faster

The worms create a lot of points. We all want infinite memory but... So I do just some of the points, then some more and some more, creating lists of convex hull points as I go. Maybe the points of each little bit are in the convex hull of the whole or maybe not, but if you take the convex hull of what you collect, it combines all of the little convex hulls together into the correct answer. The big speedup and data reduction, though, is when you do a coarse pass, then create an image that is the maximum of the coarse and original. Consider a dark spot in the center of the brightest. It appears in 255 convex hulls. But if you take the convex hull at 16 levels, and take the maximum of 0 and 240, you get 240 and that point appears in only 15 levels. So you reduce 256 to 15+15 and get a speedup of 8 times. It turns out that there are as many wormy points as there are all others put together, so you get a 4x speed improvement. It runs in 10 sec on one core on my clunky 32-bit machine on the image above.

Programming tricks using cumsum

I used Numpy's cumsum fairly extensively in the convex hull code. Cumsum takes the cumulative sum of a vector and can be configured to take the cumulative sum of the rows or columns of a matrix. For review:

>>> help(cumsum)
cumsum(a, axis=None, dtype=None, out=None)
    Return the cumulative sum of the elements along a given axis.

    Parameters
    ----------
    a : array_like
        Input array.
    axis : int, optional
        Axis along which the cumulative sum is computed. The default
        (None) is to compute the cumsum over the flattened array.
    dtype : dtype, optional
        Type of the returned array and of the accumulator in which the
        elements are summed. If `dtype` is not specified, it defaults
        to the dtype of `a`, unless `a` has an integer dtype with a
        precision less than that of the default platform integer. In
        that case, the default platform integer is used.
    out : ndarray, optional
        Alternative output array in which to place the result. It must
        have the same shape and buffer length as the expected output
        but the type will be cast if necessary. See `doc.ufuncs`
        (Section "Output arguments") for more details.
    
    Returns
    -------
    cumsum_along_axis : ndarray.
        A new array holding the result is returned unless `out` is
        specified, in which case a reference to `out` is returned. The
        result has the same size as `a`, and the same shape as `a` if
        `axis` is not None or `a` is a 1-d array.
    
    
    See Also
    --------
    sum : Sum array elements.
    
    trapz : Integration of array values using the composite trapezoidal rule.
    
    Notes
    -----
    Arithmetic is modular when using integer types, and no error is
    raised on overflow.
    
    Examples
    --------
    >>> a = np.array([[1,2,3], [4,5,6]])
    >>> a
    array([[1, 2, 3],
         [4, 5, 6]])
    >>> np.cumsum(a)
    array([ 1, 3, 6, 10, 15, 21])
    >>> np.cumsum(a, dtype=float) # specifies type of output value(s)
    array([ 1., 3., 6., 10., 15., 21.])
    
    >>> np.cumsum(a,axis=0) # sum over rows for each of the 3 columns
    array([[1, 2, 3],
         [5, 7, 9]])
    >>> np.cumsum(a,axis=1) # sum over columns for each of the 2 rows
    array([[ 1, 3, 6],
         [ 4, 9, 15]])
You can use cumsum in places where you'd otherwise use loops and you can use it to initialize a large array from a small one. For instance, you have an array of counts of how many times something occurred for each of several subjects:
>>> import numpy as np
>>> # 0-9 students per teacher for 20 teachers
>>> student_count = np.random.randint(0, 10, 20)
>>> student_count
array([3, 5, 5, 9, 9, 0, 4, 0, 3, 7, 0, 1, 3, 4, 5, 6, 5, 2, 3, 4])
You want an array that, for each student, tells you which teacher they have:
>>> # The index of the teacher's first student
>>> first_student = np.cumsum(student_count) - student_count
>>> # initialize the teacher array to the total # of students
>>> teacher = np.zeros(student_cumsum[-1], int)
>>> # mark the first student for each teacher in the array
>>> # Teachers with zero students shouldn't have their first
>>> # student marked
>>> mask = student_count > 0
>>> teacher_idx = np.arange(20)[mask]
>>> teacher[first_student[teacher_idx[1:]]] = teacher_idx[1:] - teacher_idx[:-1]
>>> # the cumulative sum only increments for the first student
>>> teacher = np.cumsum(teacher)
>>> teacher
array([ 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 6, 6,
        6, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 11, 12, 12, 12, 13, 13,
        13, 13, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16,
        16, 17, 17, 18, 18, 18, 19, 19, 19, 19])
You want to assign numbers to each of the students:

>>> student_idx = np.ones(teacher.shape, int)
>>> student_idx[0] = 0
>>> student_idx[first_student[teacher_idx[1:]]] = 1 - student_count[teacher_idx[:-1]]
>>> student_idx = np.cumsum(student_idx)
>>> student_idx
array([0, 1, 2, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0,
       1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 0,
       0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4,
       0, 1, 0, 1, 2, 0, 1, 2, 3])

leek
Last modified: Fri Sep 24 15:50:20 EDT 2010