COMS W4721 Machine Learning for Data Science

Columbia University, Spring 2019


Instructor: John Paisley
Location: 501 Schermerhorn Hall
Time: T/Th 7:40pm - 8:55pm
Office hours: Monday 11am-12pm @ 422 Mudd Building

TA's:
Danyang He
dh2914@columbia.eduFriday 7-9pm @ CS TA room, Mudd 122A (1st floor)

Arjun Srivatsa
ass2186@columbia.edu
Saturday 12-2pm @ CS TA room, Mudd 122A (1st floor)

Luv Aggarwal
la2733@columbia.edu
Tuesday 3-5pm @ CS TA room, Mudd 122A (1st floor)

Sukriti Tiwari
st3177@columbia.edu
Monday 5-7pm @ CS TA room, Mudd 122A (1st floor)

Daniel Jeong
dpj2108@columbia.edu
Thursday 2-4pm @ CS TA room, Mudd 122A (1st floor)

Josh Rutta
jar2317@columbia.edu
Tuesday 10am-12pm @ EE lounge, Mudd 1301 (13th floor)

Ghazal Fazelnia
gf2293@columbia.edu
CVN student office hours via email (no fixed time)

Synopsis:   This course provides an introduction to supervised and unsupervised techniques for machine learning. We will cover both probabilistic and non-probabilistic approaches to machine learning. Focus will be on classification and regression models, clustering methods, matrix factorization and sequential models. Methods covered in class include linear and logistic regression, support vector machines, boosting, K-means clustering, mixture models, expectation-maximization algorithm, hidden Markov models, among others. We will cover algorithmic techniques for optimization, such as gradient and coordinate descent methods, as the need arises.

Prerequisites:   Basic linear algebra and calculus, introductory-level courses in probability and statistics. Comfort with a programming language (e.g., Matlab) will be essential for completing the homework assignments. Not open to students who have taken COMS 4771, STATS 4400 or IEOR 4525.

Text:   There is no required text for the course. Suggested readings for each class will be given from the textbooks below. These readings are meant to be general pointers and may contain more material than we cover in class.

    T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Second Edition, Springer. [link]
    C. Bishop, Pattern Recognition and Machine Learning, Springer. [link]
    H. Daume, A Course in Machine Learning, Draft. [link]

Grading:   4 homework assignments (50%), midterm exam (25%), final in-class exam (25%). Each homework assignment will have a programming component that will count significantly toward the final homework grade. The final in-class exam will focus on material from the second half of the course (after Spring Break).


Date

Topics covered
Suggested readings

Week 1
1/22/2019

Introduction, maximum likelihood estimation ESL Ch. 1-2; PRML Ch. 2.1-2.3

1/24/2019

linear regression, least squares, geometric view ESL Ch. 3.1-3.2; PRML Ch. 1.1, 3.1
Week 2
1/29/2019

ridge regression, probabilistic views of linear regression ESL Ch. 3.3-3.4; PRML Ch. 3.1-3.2

1/31/2019

bias-variance, Bayes rule, maximum a posteriori ESL Ch. 7.1-7.3, 7.10; PRML Ch 2.3
Week 3
2/5/2019

Bayesian linear regression PRML 3.3-3.5

2/7/2019

sparsity, subset selection for linear regression ESL Ch. 3.3-3.8
Week 4
2/12/2019

nearest neighbor classification, Bayes classifiers ESL Ch. 13.3-13.5; CML Ch. 2, 7

2/14/2019

linear classifiers, perceptron ESL Ch. 4.5; CML 3
Week 5
2/19/2019

logistic regression, Laplace approximation ESL Ch. 4.4; PRML Ch. 4.3-4.5

2/21/2019

kernel methods, Gaussian processes ESL Ch. 6; PRML Ch. 6; CML Ch. 9
Week 6
2/26/2019

maximum margin, support vector machines ESL Ch. 12.1-12.3; PRML Ch. 7.1

2/28/2019

trees, random forests ESL Ch. 9.2, 15; CML Ch. 1
Week 7
3/5/2019

boosting ESL Ch. 10; CML Ch. 11

3/7/2019

neural networks
ESL Ch. 11; PRML Ch. 5

Week 8
3/12/2019

Midterm exam (location: IAB 417)


3/14/2019

no class


Week 9


Spring Break


Week 10
3/26/2019

no class



3/28/2019

clustering, k-means ESL Ch. 14.3;  PRML Ch. 9.1; CML Ch. 13
Week 11
4/2/2019

EM algorithm, missing data ESL Ch. 8.5; PRML Ch. 9.3-9.4

4/4/2019

mixtures of Gaussians PRML Ch. 9.2; CML Ch. 14
Week 12
4/9/2019

matrix factorization Review article

4/11/2019

non-negative matrix factorization ESL Ch. 14.6; Review article
Week 13
4/16/2019

latent factor models, PCA and variations ESL Ch. 14.5; PRML Ch. 12.1-12.3

4/18/2019

Markov models PRML Ch. 13.1
Week 14
4/23/2019

hidden Markov models PRML Ch. 13.2

4/25/2019

continuous state-space models PRML Ch. 13.3
Week 15
4/30/2019

association analysis ESL Ch. 14.2; Book chapter

5/2/2019

Final in-class exam (location: IAB 417)