Instructor: Mingrui Liu
Time/Location
Office Hours:
Contact Information:
Office: Research Hall 355
This course provides a rigorous introduction to optimization methods for machine learning, with an emphasis on large-scale training of modern neural networks. The course begins with fundamental techniques in convex optimization, including gradient methods, acceleration, and proximal algorithms, and then moves to stochastic and nonconvex optimization, which form the foundation of contemporary machine learning practice.
Building on these foundations, the course examines recent advances in understanding optimization in overparameterized models, including training dynamics under large stepsizes, implicit regularization, and generalization behavior. We also study minimax optimization and its applications to generative models and robust learning.
The latter part of the course focuses on optimization challenges arising in modern deep learning systems, including transformer training, scaling behavior, distributed optimization, and communication-efficient learning. We also introduce theoretical frameworks such as tensor programs that help explain scaling laws in neural networks. Throughout the course, we aim to connect theoretical insights with practical phenomena observed in large-scale machine learning.
Recommended Textbooks:
Goodfellow, Bengio, Courville. Deep Learning
Aug 27 (week 1): Introduction, Course Overview, Machine Learning Basics
Sep 3 (week 2): Linear Classifier, Multi-class Classification
Sep 10 (week 3): Neural Networks, backpropagation, automatic differentiation
Sep 17 (week 4): Convolutional Neural Networks
Sep 24 (week 5): Neural Network Training, Optimization, Generalization
Oct 1 (week 6): Generative Adversarial Networks
Oct 8 (week 7): Adversarial Examples: Attack and Defense
Oct 15 (week 8): Recurrent Neural Networks
Oct 22 (week 9): Transformers
Oct 29 (week 10): Deep Reinforcement Learning
Nov 5: Election Day, No Class
Nov 12 (week 11): Convex Optimization, Convergence Rates, Nonconvex Optimization
Nov 19 (week 12) Distributed Deep Learning, Federated Learning
Nov 26 (week 13): Final Project Presentation
Dec 3 (week 14): Final Project Presentation
Grades:
A: greater than 93
A-: [90, 93)
B+: [87, 90)
B: [83, 87)
B-: [80, 83)
C+: [77, 80)
C: [73, 77)
C-: [68, 73)
D: [60, 68)
Fail: below 60
Weights:
Homework: 50% (5 assignments in total)
Project proposal/presentation: 15%
Project Report: 30%
Class Participation: 5%
Late penalty:
Late submissions for whatever reason will be punished. 5% of the score of an assignment/project will be deducted per day with maximum tolerance 3 days. For example, if an assignment is submitted 2 days and 1 minute later than the deadline (counted as 3 days) and it gets a grade of 90%, then the score after the deduction will be: 95% - 3*5% = 80%.