White Papers
Audio-Visual Speech Recognition in Challenging Environments
Overview Visual speech information is known to improve accuracy and noise robustness of Automatic Speech Recognizers (ASR). However, to date, all audio-visual ASR work has concentrated on "Visually clean" data with limited variation in the speaker's frontal pose, lighting, and background. In this paper, one investigates audiovisual ASR in two practical environments that present significant challenges to robust visual processing: Typical offices, where data are recorded by means of a portable PC equipped with an inexpensive web camera, and Automobiles, with data collected at three approximate speeds. The performance of all components of a state-of-the-art audio-visual ASR system is reported on these two sets and benchmarked against "Visually clean" data recorded in a studio-like environment.
| Publisher | IBM | File Format | |
|---|---|---|---|
| Date Published | September 2003 | ||
| Format | White Papers | ||
| Topics |
|
||
