CSE 788L04, Autumn 2007: Machine Learning for Language Technology

F 9:00-11:18, 264 Dreese Labs
Instructor: Eric Fosler-Lussier, fosler at cse

This course is located on Carmen

Description

This course will explore recent trends in machine learning for language technology; in particular, we will be examining the role of novel machine learning methods for traditional tasks. Automatic speech recognition will be a focus application area for this course, although we will also discuss work in related areas (such as natural language processing), as well as reading some basic tutorial papers in machine learning technologies. Some topics can and will be adapted based on the interests of the participants. This course assumes no background in language technology or machine learning, but participants who lack this background will be expected to do background reading in addition to the assigned papers.

Level, Credits, Class Time Distribution, Prerequisites

Level Credits Class Time Distribution Prerequisites
UG 3 1 2.5-hr cl CSE 730 or Ling 684.02 or graduate standing

Quarters Offered

Intended Learning Outcomes

Texts and Other Course Materials

Topics

Topics will likely include most of the following: This list is not meant to be exhaustive, nor will everything necessarily be covered, depending on the interests of the participants and the discussion in class.

Grades

Discussion Facilitation 20%
Participation 40%
Final Project 40%

Expectations

Students enrolled in the course are expected to facilitate discussion of the papers at least once, to actively participate in these discussions, and to engage in a group or solo project (preferably group, and preferably cross-discipline) involving some sort of machine learning on language data. Since there are students with a wide variety of backgrounds in this class, my only requirement is that (a) you stretch yourself by doing something new, and (b) no decision tree classifiers. If you are new to the area, a suitable class project would be to implement one of the algorithms we discuss in the class. Advanced students should have a goal of producing something that is of workshop or conference-publishable quality by the end of the quarter (or shortly thereafter). This may not be achievable by all students in this short time period, but a good-faith effort should be made. Students auditing the course are expected to actively participate in the discussions. No final project is required, although you are welcome to present some relevant work of your own during presentations if you wish.

Paper discussions: What I don't want is a 2-hour powerpoint presentation of the papers under discussion. These almost inevitably lead to very one-sided presentations with little discussion. What I do want in this class is a dynamic discussion about the papers.

OSU has a website for each class at http://carmen.osu.edu. You'll need to log in using your OSU username and password. Then, go to CSE 788L04 under Autumn 2007. Click on "Discussions" in the menubar. Post a message in the "sign in" section so that I can see that you found everything.

Since the class is foreshortened (2 hours, 18 minutes instead of 2:48), there is a bit more participatory work than usual outside of class. On CARMEN, there are two discussion forums that we'll use regularly: "Tutorial/Review Paper Suggestions," and "Paper assignments and discussions". The first forum will be used in the beginning of each week. Most weeks, I will post a topic, such as "Hidden Markov Models". Your job is to search for resources that can help you and your colleagues get a handle on the basics of the topic. The forum will be used to trade suggestions on review articles, websites, tutorials, books(!) and other resources that you have found.

By Tuesday, 8PM your job is to (a) give a link to a particular resource that you have found on the topic, (b) skim the resource, and (c) write a one paragraph review of the resource. I will then review the reviews, look at some of the resources, and suggest one or two things for everyone to read in addition to the advanced material for the week.

The second forum is to be used for facilitating discussion in class of the advanced papers of the week and focusing on the important issues raised. By Thursday, 8PM everyone is required to post a question (or multiple questions) to the discussion list. Participants should feel free to also write initial opinions, thoughts, etc. in the discussion list. In particular, questions of the type "I didn't understand X" or "Why would they do X when Y seems simpler/better/easier" are particularly welcome. Please remember that many of these papers are written for audiences other than you (i.e. people who are already expert in that area) and frankly I don't always understand everything that's in the paper. Getting these questions out in the open can help everyone get a better understanding, even for people who thought they understood the paper. :-)

The facilitators(s) for the following day should read over the questions posed and select some subset of them for discussion.

A typical class will run as follows:

The presentation by the facilitator(s) should cover the basic points made by the papers, main ideas, and particular findings. Use of powerpoint/overhead slides should be kept to a minimum; what I am looking for is the facilitors to discuss the papers and summarize, not walk us through bit by bit. Questions may be asked from the floor; too-long presentations will be cut off by the instructor. The presentation is followed by opening the discussion with some of the questions posed. The facilitator is responsible for conducting the flow of the conversation and making sure that as many viewpoints are heard as possible.

Final projects: As noted above, students taking the class for credit are required to do a final project utilizing some sort of machine learning of language. I am perfectly happy if this fits in with your normal research, however, it might be more interesting if you end up doing a cross-disciplinary group project. I'd like an informal project proposal by the end of the third week that we can discuss. The formal requirement will be to (a) submit a two-page extended abstract of your work by the end of week 8, to be judged by a program committee of your peers, (b) present your work (possibly in progress) at the end of the quarter, and (c) turn in a final paper in the conference format of your choice. There will be no final exam.

Policy on Academic Misconduct

[Standard notice] As with any class at this university, you are required to follow the Ohio State "Code of Student Conduct." If you are unfamiliar with this policy, you should read it at http://oaa.osu.edu/coam/code.html. In particular, you should note that you are not allowed to, among other things, submit plagiarized (copied but unacknowledged) work for credit. If any violation occurs, I am required to report the violation to the Council on Academic Misconduct.