Seminar on statistical methods for modeling sequences of data, focusing primarily (but not exclusively) on language data.
| Level | Credits | Class Time Distribution | Prerequisites |
|---|---|---|---|
| UG | 3 | 2 1.5-hr cl | Some sort of AI/Data Mining/Computational Linguistics/Statistical Modeling course at the 600 level or higher. Basic knowledge of statistics is useful. |
| Discussion Faciliation | 20% |
| Participation | 40% |
| Final Project | 40% |
Thus, in this seminar we will explore the application of different kinds of statistical sequence models to different application domains. Language processing will play a focal point in our readings; however, based on the interests of the participants we will also include readings from other areas, especially if they are in line with research goals of group participants. The syllabus will be designed to be flexible to the needs of all concerned.
I'm taking the title a bit liberally: what I'm interested in is statistical methods applied to sequence data, even if the actual method does not explicitly model the sequence. For example, I'd like to do some reading up on maximum entropy models, which allow for long-range context in making local decisions within sequences.
Paper discussions: What I don't want is a powerpoint presentation of the papers under discussion. These almost inevitably lead to very one-sided presentations with little discussion. What I do want in this class is a dynamic discussion about the papers.
This year, I'm trying something new (at least to me). OSU has a website for each class at http://carmen.osu.edu. You'll need to log in using your OSU username and password. Then, go to CSE 788R04 under Winter 2006. Click on "Discussions" in the menubar. Post a message in the "sign in" section so that I can see that you found everything.
There are two discussion forums that we'll use regularly: "Paper discussions" and "Potential Papers to Read". In the latter, I'd like for you to post suggestions on papers that it might be good to read (particularly if you want to be the faciliator for that paper). Your assignment for Thursday is to post some potential papers to read.
Everyone is required to post a question (or multiple questions) to the discussion list by 8 pm the evening before the assigned readings. Participants should feel free to also write initial opinions, thoughts, etc. in the discussion list. In particular, questions of the type "I didn't understand X" or "Why would they do X when Y seems simpler/better/easier" are particularly welcome. Please remember that many of these papers are written for audiences other than you (i.e. people who are already expert in that area) and frankly I don't always understand everything that's in the paper. Getting these questions out in the open can help everyone get a better understanding, even for people who thought they understood the paper. :-)
The facilitator for the following day should read over the questions posed and select some subset of them for discussion. Then, in class, start off with a 5 minute summary of the paper, followed by opening the discussion with some of the questions posed. The facilitator is responsible for conducting the flow of the conversation and making sure that as many viewpoints are heard as possible.
Because of the number of people enrolled, we may find it easier to break into subgroups for part of the time and discuss some of the main points, then reconvene to accumulate what was discussed. Some topics may be more amenable for that (for example, if there are two opposing viewpoints in the papers presented).
Final projects: As noted above, students taking the class for credit are required to do a final project utilizing some sort of statistical sequence modeling. I am perfectly happy if this fits in with your normal research, however, it might be more interesting if you end up doing a cross-disciplinary group project. I'd like an informal project proposal by the end of the third week that we can discuss. The formal requirement will be to (a) submit a two-page extended abstract of your work by the end of week 8, to be judged by a program committee of your peers, (b) present your work (possibly in progress) at the end of the quarter, and (c) turn in a final paper in the conference format of your choice. There will be no final exam.