Programming Assignment 2: Chart Parsing
This exercise involves programming a chart parser. We provide running code written in Java or Python as a basis for your work. You can either use our code or not. Either way, you are responsible for the behavior of the code in your final submission. Feel free to give us feedback on the code, ask us questions about it, suggest improvements, and so on.
á The code is in http://www.cse.ohio-state.edu/~cbrew/732/labs/lab2.
á There is a zip file at http://www.cse.ohio-state.edu/~cbrew/732/labs/lab2.zip .
á API documentation for the Java version is at http://www.cse.ohio-state.edu/~cbrew/732/labs/lab2/javadocs
á API
documentation for the Python version at http://www.cse.ohio-state.edu/~cbrew/732/labs/lab2/pydocs.
Java jockeys should take this opportunity to familiarize themselves with javadoc, the standard Java documentation tool. Any Pythonistas should do the same with epydoc, which we used to make the Python API docs
2.1) Why doesn't "the pigeons are punished in the green room" produce a complete parse? Compare "the pigeons are punished by the professors", which does.
[easy
if you look carefully at the grammar: 10pts]
2.2) Implement and test the top-down chart parser described in the lecture slides and in chapter 13 of J&M. Measure its performance relative to the
bottom-up implementation that I provide. Here performance is measured in the following way:
- your implementation must be correct and complete. That is, it must produce all the edges that are necessary in order to build the correct parses
for each sentence. Remember that some of the sentences the grammar can generate are ambiguous, so having all the edges for just one parse may
not be enough.
- if an implementation is correct (the bottom up parser is), the fewer edges the better,
This will take effort, thought and
understanding: [50pts for
implementation, 20pts for measuring performance and reporting on it, 10pts for
devising good probe sentences and grammars that are revealing about performance.
5pt bonus if you do something we think is especially clever.]
2.3) Adjust the interface to the Rules and Chart classes so that the grammar can be read from a text file rather than being hard-coded [easy, 10 pts]
2.4) Implement and test the allParses method in Chart.java. For full credit the method should be able
to enumerate the first few parses in polynomial time even if there are ridiculously many parses in total [up to 50 bonus points] Update: It may be sensible to make this a method on Edge rather than Chart. The distributed code puts it on Chart, but you don't have to.
2.5) Devise a strategy for the chart parser
that produces the same edges as Earley's algorithm [25 bonus points]
2.6) Implement and test the countTrees method in Chart.java [10 bonus points, and a good warm up for 2.4]
2.A) Neither the top-down nor the bottom-up parser is optimal. Both hypothesize edges that they need not. Make this notion precise. Research any previous attempts to fix the problem. See if you can devise an alternative approach that, while staying correct, produces fewer edges. Can you demonstrate that your approach will always be better according to the criteria of 2.2, no matter what the grammar and input sentence? [Credit for project. This is a research topic]
2.B) As you may have noticed, the features (e.g. num or case:obj) on the preterminals and nonterminals are not passed through to the parser. So "the pigeons is punished by the professors", which has an agreement error, is accepted to the current system. Fix this by working out how to use the features to make the grammar tighter, then implementing your solution. [Credit for the project. This is not a current research topic, but a well-studied puzzle that you could learn a lot by solving]
What
to turn in
-
Your written answer to 2.1
-
Your code for 2.2 and 2.3
-
Anything else that you did.
-
Make sure that you tell the grader how
to run your code. Ant can make this a relatively easy task. There is a
build.xml directory in the download that we provide,
Make
sure that your name appears in all files that you turn in.
I
prefer you to turn in your code so that it can be executed/tested on the CSE
Unix machines.
How
to turn it in
Regardless
of where your code runs, turn in your submission electronically using your CSE
account. Create a directory containing only the files that you want to turn in.
Use the submit command to send the files to the
grader. If your directory is called lab2, the syntax of the submit command is:
> submit c732aa
lab2 lab2
I
recommend keeping the confirmation message that the submit command sends to you
as a documentation of the time you submitted.