Automatic Non-Verbal Speech Analysis

Project: $DO NOT EDIT THIS FIELD$  
Supervisor: Ke CHEN
Difficulty grading: SH/INF=C, BM=N, CM=C
Area:    Artificial Intelligence
Max number of students who can do this project: 2

This topic covers a large class of speech analysis problems that do NOT involve the understanding of linguistic or verbal information conveyed in speech signals. Instead it takes into account miscellaneous speech categorisation problems including discrimination between speech and other sounds, e.g. music, music genre classification, language identification/verification, speaker clustering in terms of dialects, and speaker tracking. In particular, speaker tracking might be one of the most difficult and challenging problems in non-verbal speech analysis. Speaker tracking is the process of following who says what in an audio stream, e.g., a period of speech engaged in dialogue. In general, speaker tracking can broadly be divided into two problems: locating the points of speaker change or audio stream segmentation and identifying the speaker in each segment or labelling each segment by a speaker identity. There is a high correlation between different problems mentioned above.
In this project, a selected topic, e.g. speaker tracking or music classification, is first well defined. Then such an automatic non-verbal speech analysis/recognition protype will be developed based on latest pattern recognition (machine learning) techniques. As a deliverable, the protype with a simple GUI should be able to work on a benchmark dataset for the demonstration purpose.

REFERENCES:

L. Lu, H. J. Zhang, "Content analysis for audio classification and segmentation," IEEE Transactions on Speech and Audio Processing, vol. 10, no. 7, 2002, pp. 504-516. (comments: exemplar work for speaker tracking/segmentation)
N. Scaringella, G. Zoia and D. Mlynek, "Automatic genre classification of music content: A survey," IEEE Signal Processing Magzine, March 2006, pp. 133-141. (comments: a survey for music classification )

COURSE PREREQUISITES: Machine Learning (Pattern Recognition), Speech Signal Processing

EQUIPMENT: PC, appropriate I/O device, Matlab