In this lecture, we will study models for predicting sequential outputs. Specifically, we will cover three kinds of models – the widely used hidden Markov models that are generative, locally trained models (MEMMs) and globally trained models (CRFs, structured Perceptron). We will also see our first concrete combinatorial inference algorithm that uses dynamic programming – the Viterbi algorithm.
Lecture slides
Background reading, surveys and tutorials
-
Lawrence Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of IEEE, 1989. (required)
-
Yoshua Bengio, Markovian Models for Sequential Data
-
Michael Collins’ notes on Log-Linear Models, MEMMs, and CRFs
-
Sutton, Charles, and Andrew McCallum. An introduction to conditional random fields Foundations and Trends in Machine Learning 4, no. 4 (2012): 267-373.
More readings
-
(*) Andrew McCallum, Dayne Freitag and Fernando Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation, ICML 2000.
-
(*) John Lafferty, Andrew McCallum and Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001.
-
(*) Michael Collins, Discriminative Training for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP 2002
-
(*) Vasin Punyakanok and Dan Roth, The Use of Classifiers in Sequential Inference, NIPS 2001.