2. Theory

Robust machine vision systems are frequently required to recognize targets of unknown scale, unknown position and unknown rotation. In general, any solution to this difficult problem will require that the input imagery be somehow constrained. In industrial machine vision applications, the camera position can be fixed, there is usually a limited number of variations on the object to recognize, and the lighting conditions can be fixed if the system is enclosed. With systems such as this in place, 2D intensity images can be thresholded to yield images typified by figure 2.

By thresholding an image, we seek to eliminate all information except
for the general shape of an observed object. From these thresholded
silhouette images, referred to as *blobs*, one observes the low
frequency components of the imaged blob: size, orientation, and
low-level shape. By edge processing the image, the high frequency
detail of the blob's bounding contour is revealed. Figure
3 shows the result of a thresholded 3x3 optimized Sobel
edge operation on the blob image.

In *Visual Pattern Recognition by Moment Invariants*[3], Hu
derives a set of seven functions that make use of the central moments
of a blob image; their output is independent of any translation,
rotation or mirror image of a particular blob, and they can be used in
conjunction with both the blob image itself and the edge-processed
contour image. All the images in figure 4 accurately
produce the same numerical result for each of Hu's seven equations.

Hu's equations are based on the uniqueness theory of moments. For a
digital image of size (*N*,*M*) the *p*+*q*th order moments *m*_{pq} are
calculated, (for
)

**The uniqueness theorem of moments:** *The infinite sequence of
moments **m*_{pq}* is uniquely determined by the joint function **f*(*x*,*y*)*;
conversely, the function **f*(*x*,*y*)* is uniquely determined by an
infinite sequence of moments **m*_{pq}*. * Strictly, this is only
valid for continuous functions whose value is zero wherever *x*,*y* are
not finite. In general, gross image shape is represented well by the
lower-order moments, and high-order moments only reflect the
subtleties of a silhouette or boundary image. Nearly all work with
moment invariants, including Hu's, depends only on moments of order
zero to three.

The central moments of a digital blob image are inherently translation independent,

(2) |

Where and . Hu's seven moment functions below utilize the central moments of a digital silhouette or boundary image, but are also rotation independent (equations 3 to 9).

*M*_{1} through *M*_{6} are all translation and rotation independent for
digital images, but *M*_{7} is actually *skew invariant* - its sign
can be used to detect mirror images.

In a paper by Dudani, Breeding, and McGhee[2], Hu's equations above were expanded to act nearly scale-invariant. Using the radius of gyration of a planar pattern, , the equations can be normalized so that they remain unaffected by the size of the blob or edge boundary in the digital image:

M_{1}' |
= | (10) | |

M_{2}' |
= | M_{2}/r^{4} |
(11) |

M_{3}' |
= | M_{3}/r^{6} |
(12) |

M_{4}' |
= | M_{4}/r^{6} |
(13) |

M_{5}' |
= | M_{5}/r^{1}2 |
(14) |

M_{6}' |
= | M_{6}/r^{8} |
(15) |

M_{7}' |
= | M_{7}/r^{1}2 |
(16) |

For this paper, *M*_{1}' will not be used because it requires that the
distance from the camera to the observed object, *B*, is known.

When designing a classifier, either set of above relations can be packed into a seven-dimensional feature vector . In this high-dimensional space, groups of feature vectors corresponding to particular silhouette blob or edge boundary images can be easily separated by splitting the space with hyperplanes or other high-dimensional surfaces. In practice, as will be shown in the following section, one may not even need all seven moment invariant functions to design a classifier.

1999-04-09