Donna Byron  

news

publications

projects

teaching

colleagues

about me

 

Research

I co-direct the Speech and Language Technologies lab at OSU. The goal of the CSE SLaTe Lab is to build software that allows humans and machines to interact using natural language.

We create algorithms and software for many challenging aspects of dialog-enabled software, including speech recognition, discourse processing, anaphora resolution, and bootstrapping language resources for new domains.

As of Autumn quarter 2007, I am not taking any additional graduate research assistants.

The following research areas are currently active in the lab:

OCEANS: The OSU Collaborative Embodied Agent with Natural Speech


Software agents that are embodied and mobile within a 3D space are a core component of many exciting developing application areas within AI. Examples are unmanned autonomous vehicles, assistant robots like domestic helpers or hospital couriers, and also simulation characters for training, entertainment, or social interaction. These sorts of agents need to carry on a dialog while also having a shared experience of the external world with their human partner, and they also must understand spatial relationships between the dialog partners and objects in the world that are under discussion. This requires a whole collection of dialog skills that existing software does not model.

The OCEANS project is building a conversational agent that operates within a simple virtual world to perform a search-and-rescue task with a human partner. The agent is autonomous and communicates with its human partner through natural conversational language. Although the task is simple, the set of behaviors needed to sense and act in the world and also carry on a dialog is quite complex. System building began in March 2005 for our current agent, which lives in a Quake level.
Collaborators: Eric Fosler-Lussier, Timothy Weale, Tianfang Xu, Laura Stoia, Guadalupe Canahuate, Nick Dimiduk

CIVET: Collaborative Interaction in Virtual EnvironmenTs


In tandem with the OCEANS project, we have been examining human-human collaboration in a simple virtual world. In the corpus we are studying, the two partners perform a treasure hunt task and talk to each other in order to coordinate their activity. We have analyzed a handful of conversations this year, and have made some interesting new discoveries about situated language. Collaborators: Timothy Weale, Tianfang Xu, Laura Stoia, Guadalupe Canahuate, Craige Roberts, Brad Mellen, Thomas Mampilly, Vinay Sharma, Aakash Dalwani, Ryan Gerritsen, Mark Keck,

NOTE We have built a small corpus of these human-human problem-solving dialogs for a treasure hunt task in a QUAKE virtual world. The recordings are freely available for other researchers to study. If you would like to use them, please see the Quake corpus web page

A General Purpose Pronoun Resolution Tool


Our goal is to create an anaphor resolution module that can be plugged into other language processing software, and that has an easily-configurable internal search procedure. We have created a PYTHON module called PYCOT that will become a component that can be added to the NLTK toolkit according to the proposal of (Byron and Tetreault, 1999). Our first experience with the software was to process the English Wall Street Journal from the Penn Treebank using an Optimality-theory style search algorithm suggested by (Beaver, 2004). This year we added a pre-processor that runs on the Korean Treebank. The anaphora resolution portion utilizes much of the same code when it runs in English or Korean, but a runtime configuration variable adapts the internals of the search process to consider additional information only available in Korean.
Collaborators: Whitney Gegg-Harrison, Joel Tetreault

Behavioral and processing models for demonstrative pronouns


Using eye-trackers to capture the moment-by-moment development of interpretation by humans, we have been investigating the differences between personal pronouns like it and demonstrative pronouns like that. Collaborators: Sarah Brown-Schmidt, Mike Tanenhaus, Shari Speer

Computational Models for Zero Anaphors in Korean


In languages such as Korean, Japanese, Spanish and Portuguese that make heavy use of null anaphors, are null anaphors used in the same circumstances as overt pronouns in languages like English? If they are like English overt pronouns, they may yield to the same processing models as we use for English pronouns. Intuitively, it would seem that null anaphors would appear in highly predictable positions, and their meaning would therefore be easy to calculate. But when the language has both overt and null anaphora, as Korean does, do the two forms need different processing? Computer-readable texts and transcripts have recently become available that allow us to investigate these questions. Our first experiment with null pronouns in Korean, using the Penn Korean Treebank, is currently under review.
Collaborators: Sun-Hee Lee, Whitney Gegg-Harrison, Seok Bae Jang

Bootstrapping Linguistic Resources


AI system development has always been plagued by the problem of codifying human knowledge and experience in computer form. Natural language systems are no exception. For spoken dialog systems, all of the vocabulary items and concepts that a particular system needs to be able to discuss must be painstakingly defined by hand. With the availability of large knowledge collections such as the WWW, it has recently become more feasible to automatically bootstrap linguistic knowledge for a system under development. Like many other labs, we are experimenting with techniques to bootstrap vocabulary items, examples of larger constituents such as full sentences, and concepts for a particular domain from the web.
Collaborators: Eric Fosler-Lussier, Laura Stoia, Tianfang Xu, Jeremy Morris

Last modified: Thu Aug 4 17:20:18 EDT 2005 by dbyron