KC.3 Semi-supervised Learning by Incorporating Manifold Information

Traditionally there are two main paradigms in machine learning, supervised vs. unsupervised learning. A supervised learning algorithm uses teacher's information (labeled examples) to train a learner while unlabeled data are automatically categorised by an unsupervised learning algorithm without using teacher's information. In reality, however, labeled examples are often difficult, expensive, and/or time-consuming to obtain, which demands the efforts of experienced human annotators, while unlabeled data may be relatively easy to collect. Semi-supervised learning offers a new techniques with the use of large amount of unlabeled data along with the labeled examples.

There appear a number of semi-supervised learning algorithms developed from different perspectives. Nevertheless a common issue for different methods is how to exploit the information conveyed in unlabeled data along with information provided by labeled data. Recently, Belkin and Niyogi (2004) proposed a semi-supervised learning method by exploiting the manifold structural information to improve the performance. In this project, the student will investigate their algorithm by a Matlab implementation. The algorithm will be applied to synthetic and benchmark data sets to evaluate its effectiveness. The deliverable of this project will be a demo system with an appropriate interface.

References: M. Belkin and P. Niyogi, "Semi-supervised learning on Rimannian manifolds," Machine Learning, vol. 56, 2004, pp. 209-239.

Prerequisites: Good mathematics background is essential. The project will be carried out in Matlab. Some knowledge of machine learning would be an advantage (eg. COMP60431).