Professional Documents
Culture Documents
Building A Data Corpus For Audio-Visual Speech Recognition: March 2012
Building A Data Corpus For Audio-Visual Speech Recognition: March 2012
net/publication/228611684
CITATIONS READS
10 815
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Léon J. M. Rothkrantz on 22 December 2014.
LANGUAGE QUALITY
Corpus Language Sessions Respondents Audio Quality Video Quality Language Quality Stated purpose
TULIPS1 English 1 7male, 11.1kHz, 8bits 100x75, 8bit, 30fps first 4 digits in small vocabulary isolated
5female controlled audio mouth region English words recognition
AVletters English 1 5male, 22kHz, 16bits 80x60, 8buts, 25fps the English alphabet spelling English alphabet
5female controlled audio mouth region
7,000 utterances
CUAVE English 1 19male, 44kHz, 16bits 720x480, 24bits 29.970fps connected and continuous speech
17female controlled audio passport view isolated digits recognition
TIMIT corpus
tackle this question we plan to build such a database in the
the frame rate matches the audio frame rate? In order to
Vid-TIMIT English 3 24male, 32kHz, 16bits 512x384, 24bits, 25fps 10 sentences per automatic lipreading,
19female controlled audio upper body person face recognition
290
IBM English 1 Unknown 22kHz, 16bits -- connected digits audio-visual speech
LVCSR* gender -- isolated words recognition
AUTHORS BIOGRAPHY