I have filled the research positions previously advertised. However, I am always looking for good (current OSU) students for future projects. Please contact me if you are interested.
If you are interested in applying to OSU, please contact the graduate admissions office. I can tell you some things about my research, but a prospective student is admitted to the department, not my research lab.
CURRENT RESEARCH TOPICS
Prediction of errors made by speech recognition systems
Speech recognition systems today are complicated to
understand because of the interconnectedness of the acoustic,
pronunciation, and language models. This interconnectedness
also makes it difficult for non-experts to tell what will happen
when a system is deployed. In this work, we are developing
models that can predict what types of errors can be expected
from a speech recognition system, which can be used to diagnose
system components and allow consumers of ASR technology to
better plan for unexpected outcomes.
Spoken dialogue systems for navigating spatially-oriented data More details forthcoming...
Searchable annotation of pronunciation variation in the VIC corpus In this work, we investigate the construction of a large-scale searchable phonetic corpus of spontaneous American English interviews. The corpus will allow both linguistic investigations as well as improved phonetic models for automatic speech recognition.
Automatic Speech Attribute Transcription (ASAT) [NSF Abstract]
It has long been postulated that a human determines the linguistic identity of a sound based on detected evidences that exist at various levels of the speech knowledge hierarchy, from acoustics to pragmatics. Indeed, people do not continuously convert a speech signal into words as an automatic speech recognition (ASR) system attempts to do. Instead, they detect acoustic and auditory evidences, weigh them and combine them to form cognitive hypotheses, and then validate the hypotheses until consistent decisions are reached. The above human-based model of speech processing suggests a candidate framework for developing next generation speech technologies that have the potential to go beyond the current limitations. In order to bridge the performance gap between ASR systems and humans, the narrow notion of speech-to-text in ASR has to be expanded to incorporate all related human information "hidden" in speech utterances. Instead of the conventional top-down, network decoding paradigm for ASR, we are establishing a bottom-up, event detection and evidence combination paradigm for speech research to facilitate collaborative Automatic Speech Attribute Transcription (ASAT). The goals of the proposed project are: (1) develop feature detection and knowledge integration modules to demonstrate ASAT and ASR; (2) build an open source, highly shared, plug-'n'-play ASAT cyberinfrastructure for collaborative research to lower entry barriers to ASR; and (3) provide an objective evaluation methodology to monitor technology advances in individual modules and across the entire system.
PAST RESEARCH TOPICS
DARPA Communicator: end-to-end spoken dialogue system for travel reservations (w/ E. Ammicht, A. Potamianos, M. Galley, A. Pargellis, and C. Lee)
Flexible representations of ambiguous/errorful language
Expressive templates for natural language generation
Automatic induction of semantic classes
Semantic-class based language modeling
Natural language call routing (w/ J. Kuo, I. Zitouni, C. Lee, and others)
Discriminative training for call routing
Minimum verification error training
ASR Pronunciation Modeling
Pronunciation lexicons that dynamically change reflecting local speaking rate and word context.
Speaking rate estimation using signal-processing based measures (w/ N. Morgan and N. Mirghafori)
Linguistic studies
Effects of speaking rate, word predictability, and other contextual factors on the pronunciation of function words (w/ A. Bell, D. Jurafsky, and others)
Effects of speaking rate and word predictability on both phone pronunciations and ASR results (w/ N. Morgan and S. Greenberg)
Relationships between prosodic strength and semantic content (w/ C. Shih, G. Kochanski, M. Chan, and J. Yuan)