In this lecture, we look at decision trees and the popular ID3 heuristic for learning decision trees. We work through an example that applies the ID3 heuristic on a small data set.
The lecture ends with practical concerns about the decision tree learning and a first look at the problem of overfitting.
Lectures
 Lecture slides:
 Videos:
 Decision trees: Representation
 Decision trees: The ID3 algorithm

Decision Trees: Discussion 1, [Decision trees: Discussion 2]
 Videos from previous years:

Representation: [spring 2023], [fall 2018], [fall 2017]

Decision trees learning: [spring 2023], [fall 2018], [fall 2017]

Discussion: [spring 2023], [fall 2018], [fall 2017]

Links and Resources

Tom Mitchell’s textbook has a good overview of decision trees

Andrew Moore’s slides on decision trees and information gain

J.R Quinlan, Induction of Decision Trees, 1986.

The first two chapters of Information Theory, Inference, and Learning Algorithms introduce the basic concepts in information theory like entropy.

Chapter 1 of A course in machine learning. Available online.
Further reading

Ron Rivest, Learning Decision Lists, 1987.

Laurent Hyafil and Ron Rivest, Constructing Optimal Binary Decision Trees is NPComplete