Robust estimation of Gaussian Mixtures from noisy Input Data


Summary

Standard EM-based clustering algorithms assume that input data points are all equally important and the effect of noise or measurement errors is often ignored and not explicitly modeled during model estimation. However, this is not always a valid assumption since the input data can be corrupted by measurement errors. Uncertainties can also be introduced by additional transformations on the data such as dimensionality reduction. In particular, in the case of nonlinear transformations, it is no longer safe to assume that the uncertainties are uniform across the data. If the level of uncertainties can be quantified, it makes sense to incorporate them into the clustering algorithm to improve the estimation of the true data distribution. In this paper we propose a novel algorithm for learning a mixture of Gaussians that takes into account the uncertainties of the input data. In our formulation we assume the uncertainty on a data point can be modelled by a multivariate Gaussian distribution and is independent from the other data points. Intuitively, this allows a data point with large uncertainty to exert less influence on the mixture components, than a data point with smaller uncertainty. The optimal mixture model that represents the data with uncertainties is found using a variational bayesian algorithm that automatically chooses the appropriate number of components in the mixture model. We show that by taking into account the uncertainty of information, our algorithm performs better at estimating the correct number of clusters and recovering the true distribution of the training data compared to other variational bayesian clustering algorithms. The proposed algorithm is evaluated on a number of synthetic and real data sets and is shown to improve the results of various pattern recognition tasks such as motion segmentation and partitioning of microarray gene expressions.

Example Application: Motion Segmentation

We applied our clustering algorithm to the problem of multi-body motion segmentation. The segmentation results on a well known walking sequence are shown below. Feature points are detected and tracked using the Kanade-Lucas-Tomasi (KLT) tracker and the uncertainty of each tracked feature point was evaluated using a SSD based method.

Fig. 1: Original movie (movie)

Fig. 2: Clustering without taking uncertainties into account (movie)

Fig. 3: Clustering with uncertainties (movie)


The sequence shows two persons walking past each other and a number of feature points on the persons are poorly localised due to the stripey clothing and/or lack of texture. Segmentation results were obtained using a standard variational bayes clustering algorithm where clustering is done on the velocities of tracked feature points from the two preceding frames.

Without taking into account the uncertainties of the tracked features, the segmentation results obtained by clustering are quite poor, as shown in Figure 2. With uncertainties taken into account, the segmentation obtained is much closer to the ground truth.

For more details please refer to:

  • S. Hou and A. Galata, Robust Estimation of Gaussian Mixtures from Noisy Input Data , IEEE Int Conf. Computer Vision and Pattern Recognition (CVPR), 2008. ( .pdf )



  • Back to Aphrodite's Home Page