Please see research page for list of publications

Articulatory feature-based models

Conversational speech is characterized by large amounts of pronunciation variability and disfluencies that continue presenting challenges for speech recognition systems. Recently, I have been working on using articulatory feature-based models for conversational speech tasks. These models view the observed pronunciation variation to be the result of asynchrony between the various articulators. Within this domain I have worked on both directed (dynamic Bayesian networks (DBNs))as well as undirected graphical models (conditional random fields (CRFs)).

  • R. Prabhavalkar, E. Fosler-Lussier, K. Livescu. "A Factored Conditional Random Field Model For Articulatory Feature Forced Transcription" IEEE ASRU workshop (to appear), Hawaii, USA, 2011.
  • A. Næss, K. Livescu, R. Prabhavalkar. "Articulatory Feature Classification Using Nearest Neighbors" Proceedings of Interspeech (to appear), Florence, Italy, 2011.

CRF-based phone recognition

We proposed an extension of the linear-chain CRF model by incorporating an MLP-like hidden layer for CRF-based phone recognition, that we termed as a multilayer-CRF (ICASSP,10). This idea was also independently proposed by Peng et al. as the conditional neural field model (NIPS, 2009).

  • R. Prabhavalkar, E. Fosler-Lussier. "Backpropagation Training for Multilayer Conditional Random Field based Phone Recognition" Proceedings of ICASSP, Dallas, USA, 2010.
    (link | abstract | bibtex | slides)

Speech segregation with binary masks

I am interested in techniques for speech segregation by estimating binary masks; particularly using principled techniques for combining multiple sources of information. In previous work, we have explored the use of random fields for combinining various monaural cues for voiced speech segregation (Interspeech, 09). In more recent work, we have explored the integration of both monaural and binaural cues for speech segregation (Interspeech,10).

  • J. Woodruff, R. Prabhavalkar, E. Fosler-Lussier, D. L. Wang. "Combining monaural and binaural evidence for reverberant speech segregation" Proceedings of Interspeech, Makuhari, Japan, 2010.
    (link | abstract | bibtex | slides)
  • R. Prabhavalkar, Z. Jin, E. Fosler-Lussier. "Monaural Segregation of Voiced Speech using Discriminative Random Fields" Proceedings of Interspeech, Brighton, UK, 2009.
    (link | abstract | bibtex | poster | slides)