Speech driven Facial Animation

Abstract

We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of a shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found automatically using a variable length Markov model trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech.

For more details see:

  • S. Deena, S. Hou and A. Galata "Visual Speech Synthesis by Modelling Coarticulation Dynamics using a Non-Parametric Switching State-Space Model", in Proc. International of the 12th International Conference on Multimodal Interfaces(ICMI-MLMI), 2010 ( .pdf ).
  • Method Overview

    Figure 1: Overview (Larger view)

    Demos

    Fig. 1: Example one (movie)

    Fig. 2: Example 2 (movie)


    Back to Aphrodite's Home Page