NOTE: The following materials are presented for timely dissemination of academic and technical work. Copyright and all other rights therein are reserved by authors and/or other copyright holders. Persoanl use of the following materials is permitted and, however, people using the materials or information are expected to adhere to the terms and constraints invoked by the related copyright.

A Modified HME Architecture for Text-Dependent Speaker Identification


A modified hierarchical mixtures of experts (HME) architecture is presented for text-dependent speaker identification. A new gating network is introduced to the original HME architecture for the use of instantaneous and transitional spectral information in text-dependent speaker identification. The statistical model underlying the proposed architecture is presented and learning is treated as a maximum likelihood problem; in particular, an Expectation-Maximization (EM) algorithm is also proposed for adjusting the parameters of the proposed architecture. An evaluation has been carried out using a database of isolated digit utterances by 10 male talkers. Experimental results demonstrate that the proposed architecture outperforms the original HME architecture in text-dependent speaker identification.

Click tnn96.pdf for full text and tnn96c.pdf for the correction to Figure 1.