CSE 788R04: Statistical Sequence Recognition

TR 11-12:18, 318 Bolz Hall
Instructor: Eric Fosler-Lussier, fosler at cse

Description

Seminar on statistical methods for modeling sequences of data, focusing primarily (but not exclusively) on language data.

Level, Credits, Class Time Distribution, Prerequisites

Level Credits Class Time Distribution Prerequisites
UG 3 2 1.5-hr cl Some sort of AI/Data Mining/Computational Linguistics/Statistical Modeling course at the 600 level or higher. Basic knowledge of statistics is useful.

Quarters Offered

Intended Learning Outcomes

Texts and Other Course Materials

Topics

The exact set of readings will be set early in the quarter based on the interests of the participants. Potential topics include: This list is not meant to be exhaustive, nor will everything necessarily be covered.

Grades

Discussion Faciliation 20%
Participation 40%
Final Project 40%

Long description

This seminar was originally envisioned as a way to read some recent papers about developments in statistical modeling for the Natural Language Processing field (e.g., conditional random fields). However, as I was putting together a syllabus, I realized that there was an opportunity for cross-fertilization of fields, including speech/language processing of various flavors (e.g., automatic speech recognition, statistical machine translation), as well as other fields like computer vision, bioinformatics, and data mining.

Thus, in this seminar we will explore the application of different kinds of statistical sequence models to different application domains. Language processing will play a focal point in our readings; however, based on the interests of the participants we will also include readings from other areas, especially if they are in line with research goals of group participants. The syllabus will be designed to be flexible to the needs of all concerned.

I'm taking the title a bit liberally: what I'm interested in is statistical methods applied to sequence data, even if the actual method does not explicitly model the sequence. For example, I'd like to do some reading up on maximum entropy models, which allow for long-range context in making local decisions within sequences.

Expectations

Students enrolled in the course are expected to facilitate discussion of the papers at least once (and maybe twice), to actively participate in these discussions, and to engage in a group or solo project (preferrably group, and preferrably cross-discipline) involving some sort of sequence modeling. The goal should be to produce something that is of workshop or conference-publishable quality by the end of the quarter (or shortly thereafter). This may not be achievable by all students in this short time period, but a good-faith effort should be made. Students auditing the course are expected to facilitate paper discussion once and actively participate in the discussions. No final project is required, although you are welcome to present some relevant work of your own during presentations if you wish.

Paper discussions: What I don't want is a powerpoint presentation of the papers under discussion. These almost inevitably lead to very one-sided presentations with little discussion. What I do want in this class is a dynamic discussion about the papers.

This year, I'm trying something new (at least to me). OSU has a website for each class at http://carmen.osu.edu. You'll need to log in using your OSU username and password. Then, go to CSE 788R04 under Winter 2006. Click on "Discussions" in the menubar. Post a message in the "sign in" section so that I can see that you found everything.

There are two discussion forums that we'll use regularly: "Paper discussions" and "Potential Papers to Read". In the latter, I'd like for you to post suggestions on papers that it might be good to read (particularly if you want to be the faciliator for that paper). Your assignment for Thursday is to post some potential papers to read.

Everyone is required to post a question (or multiple questions) to the discussion list by 8 pm the evening before the assigned readings. Participants should feel free to also write initial opinions, thoughts, etc. in the discussion list. In particular, questions of the type "I didn't understand X" or "Why would they do X when Y seems simpler/better/easier" are particularly welcome. Please remember that many of these papers are written for audiences other than you (i.e. people who are already expert in that area) and frankly I don't always understand everything that's in the paper. Getting these questions out in the open can help everyone get a better understanding, even for people who thought they understood the paper. :-)

The facilitator for the following day should read over the questions posed and select some subset of them for discussion. Then, in class, start off with a 5 minute summary of the paper, followed by opening the discussion with some of the questions posed. The facilitator is responsible for conducting the flow of the conversation and making sure that as many viewpoints are heard as possible.

Because of the number of people enrolled, we may find it easier to break into subgroups for part of the time and discuss some of the main points, then reconvene to accumulate what was discussed. Some topics may be more amenable for that (for example, if there are two opposing viewpoints in the papers presented).

Final projects: As noted above, students taking the class for credit are required to do a final project utilizing some sort of statistical sequence modeling. I am perfectly happy if this fits in with your normal research, however, it might be more interesting if you end up doing a cross-disciplinary group project. I'd like an informal project proposal by the end of the third week that we can discuss. The formal requirement will be to (a) submit a two-page extended abstract of your work by the end of week 8, to be judged by a program committee of your peers, (b) present your work (possibly in progress) at the end of the quarter, and (c) turn in a final paper in the conference format of your choice. There will be no final exam.

Policy on Academic Misconduct

[Standard notice] As with any class at this university, you are required to follow the Ohio State "Code of Student Conduct." If you are unfamiliar with this policy, you should read it at http://oaa.osu.edu/coam/code.html. In particular, you should note that you are not allowed to, among other things, submit plagiarized (copied but unacknowledged) work for credit. If any violation occurs, I am required to report the violation to the Council on Academic Misconduct.