Statistical Learning
Course material for BST 263
Instructor: Jeff Miller
Spring 2019
Harvard T.H. Chan School of Public Health
Department of Biostatistics
Synopsis
Statistical learning is a collection of flexible tools and techniques for using data to construct prediction algorithms and perform exploratory analysis. This course will introduce students to the theory and application of methods for supervised learning (classification and regression) and unsupervised learning (dimension reduction and clustering). Students will learn the mathematical foundations underlying the methods, as well as how and when to apply different methods. Topics will include the bias-variance tradeoff, cross-validation, linear regression, logistic regression, KNN, LDA/QDA, variable selection, penalized regression, generalized additive models, CART, random forests, gradient boosting, kernels, SVMs, PCA, and K-means. Homework will involve mathematical and programming exercises, and exams will contain conceptual and mathematical problems. Programming in R will be used throughout the course to provide hands-on training and practical examples.
General information
Lecture notes
- 1. Introduction (Course overview, Choosing among methods)
- 2. Probability and linear algebra basics
- 3. Measuring performance (K-nearest neighbors, MSE, Bias-variance, Classification error rate, Bayes optimal)
- knn.r (R code for KNN regression, MSE, Bias-variance tradeoff)
- knn-classifier.r (R code for KNN classifier, Error rate, Bayes optimal classifier)
- 4. Lab on KNN and measuring performance
- 5. Linear regression (Probabilistic model, Basis functions, Estimation, Uncertainty quantification)
- 6. Lab on Linear regression
- 7. Classification (Loss functions, Confusion matrix, ROC curve, Logistic regression, LDA/QDA)
- 9. Cross-validation (k-fold CV, Choosing model settings with CV, Choosing # of folds)
- cv.r (R code for cross-validation topics)
- 11. Penalized regression (Subset selection, Model selection, Ridge, Lasso, Elastic net)
- 12. Lab on Penalized regression
- 13. Principal components analysis (Intuition, Covariance method, SVD method, Principal components regression)
- 14. Lab on PCA
- (In progress)
Homework assignments
Exams