Logo Logo
Hilfe
Kontakt
Switch language to English
A system for video-based analysis of face motion during speech
A system for video-based analysis of face motion during speech
During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods.
face, motion tracking, auditory-visual speech, wavelets, video
Kroos, Christian
2004
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Kroos, Christian (2004): A system for video-based analysis of face motion during speech. Dissertation, LMU München: Fakultät für Sprach- und Literaturwissenschaften
[thumbnail of Kroos_Christian.pdf]
Vorschau
PDF
Kroos_Christian.pdf

5MB

Abstract

During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods.