Abstract
The goal of this project is to
implement a system to analyse an audio signal containing speech, and produce a
classification of lip shape categories (visemes) in order to synchronize the
lips of a computer generated face with the speech. The thesis describes the
work to derive a method that maps speech to lip movements, on an animated face
model, in real time. The method is implemented in Matlab. The program reads
speech from pre-recorded audio files and continuously performs spectral
analysis of the speech. Neural net-works are used to classify the speech into a
sequence of phonemes, and the corresponding visemes are shown on the screen. Some
time delay between input speech and the visualization could not be avoided, but
the overall visual impression is that sound and animation are synchronized.
Chapter one
Introduction
1.1 Background of the study
The human face is an extremely
important communication channel. The face can express lots of information, such
as emotions, intention or general condition of the person. In noisy
environments, lip movements can compensate for a possible loss in speech
signal. Moreover, the visual component of speech plays a key role for hearing
impaired people. Besides the communication functions, the human face is primary
element in human recognition. Composed of a complex structure of bones and
muscles, it is extremely flexible and capable for various movements and face
expressions. Such anatomical complexity accompanied by human sensitivity to
discontinuities in simulated face movements, makes face animation one of the
most difficult and challenging research areas in the computer animation.
Virtual humans are graphical simulations of real or imaginary persons capable
of human-like behaviour, most importantly talking and gesturing [1]. When
integrated into an application, a virtual human representing a real human,
brings life and personality, improves realism and in general provides a more
natural interface. The rules of human behaviour, among others, imply speech and
facial displays - in face-to face conversation among humans, both verbal and
nonverbal communication takes part. For a realistic result, lip movements must
be perfectly synchronized with the audio. Other than lip sync, realistic face
animation includes facial displays. In our work we are interested in those
facial displays that are not explicit emotional displays (i.e. expression such
as smile) and also those which are not explicit verbal displays.
The goal of this project is to
construct and implement a real time speech to face animation system. The
program is based on the Visage Technologies [2] software. Neural networks are
used to classify the incoming speech, and the program shows an animated face
which mimics the sound. The animation is already implemented, so the work done
in this thesis is focused on signal processing of an audio signal, and the
implementation of speech to lip mapping and synchronization.
It is very important that the facial
animation and sound are synchronized, which makes demands on the program
considering the time delay. Some time delay must be accepted, since speech has
to be spoken before it can be classified. The goal set for this thesis is 100
ms as the upper limit of delay from input speech to visualization.
1.2
Statement of the problem
The
problems associated with the existing system include the following:
i. There is no current recovery method
for point loss
ii. Feature points skewed in perspective
create playback artifacts
iii. There are only 22 feature points, which
cannot fully describe a face
iv. Initialization requires mouse-clicking
the markers on the first frame
v. The current algorithm is costly
REAL TIME SPEECH DRIVEN FACE ANIMATION
Chapters: 1 - 5
Delivery: Email
Chapters: 1 - 5
Delivery: Email
Number of Pages: 75
Price: 5000 NGN
In Stock

No comments:
Post a Comment
Add Comment