Review sheet for CSE 732 Spring 2009
¯ N-gram probability: conditional probability of word given previous N-1 words
¯ Smoothing: a way of estimating N-gram (and other) probabilities when the training data does not have all the necessary counts.
¯ Training set and test set: why it is important to separate off a test set that is not used in training.
¯ Closed class and open class words: nouns, verbs, adjective, articles, adverbs, prepositions
¯ Lexical ambiguity: words that can have different parts of speech in different contexts (e.g. flies could be noun or verb)
¯ Part of speech tagging: distinction between rule-based and statistical methods
¯ HMMs: Hidden states vs. observations. Transition probabilities and emission probabilities
¯ HMM decoding: Viterbi algorithm to get best tag sequence
¯ Context-free grammars: rules, trees, syntactic categories, start symbol
¯ Major categories of English: noun phrase, verb phrase, prepositional phrase
¯ Syntactic ambiguity: prepositional phrase attachment
¯ Parsing as search: top-down, bottom-up, use of agenda
¯ Dynamic programming: trading space for time, avoiding unnecessary repetition of work.
¯ Probabilistic parsing: how to add probabilities to context-free grammars. P(rhs|lhs)
¯ Lexicalized parsing: why to add head words to syntax trees. Increased ability to discriminate between parses at cost of explosion in grammar size.
¯ Modern statistical parsing: Consequences of grammar size explosion. Selective lexicalization. Category splitting. Beam search as an approximation to full exploration of chart. Coarse-to-fine parsing,
¯ Markov grammar: Collins, Charniak. Generating (lexicalized) syntax trees by steps smaller than a whole rule at a time.
¯ Named entities: people, places, organizations, times and dates
¯ B-I-O representation: how to turn a sequence labeling problem into a set of (related) pointwise classification problems.
¯ Sequences models: HMMs and MEMMs for named entity recognition. Why MEMMs are richer (and more appropriate) models for situations where decisions interact.
¯ Independence assumptions: building manageable statistical models of complex situations by (a) building a model using the language of conditional probabilities (b) making assumptions about which aspects of the model influence each other most strongly (c) assuming we can ignore the other dependencies (d) working out a way of computing and estimating using the new model.
¯ Apollo 13 syndrome: we have a task to do, and some available tools that we understand. How can we press the tools into service in order to get an adequate job done?
¯ Measurement: really a consequence of independence assumption and Apollo 13 syndrome. We canÕt fully tell whether the idea is going to work, so we need to run experiments on data. Hence precision, recall, F-measure, etc.