(Fall 2026) CS690: Large-Scale Optimization for Machine Learning

Instructor: Dr. Mingrui Liu

Description

Time/Location

Monday 4:30 pm – 7:10 pm, Innovation Hall, Room 129

Office Hours:

Monday 2 pm – 3 pm or by appointment

Contact Information:

Office: ENGR 5322
Email: (firstname)l (at) gmu (dot) edu

Graduate Teaching Assistant (GTA):

GTA: TBD
Office hour: TBD

Recommended Textbooks:

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
Jean‑Baptiste Hiriart‑Urruty and Claude Lemaréchal. Fundamentals of Convex Analysis. Springer, 2021.
Leon Bottou, Frank Curtis and Jorge Nocedal. “Optimization Methods for Large‑Scale Machine Learning.” SIAM Review, 2018.

Course Overview

Optimization plays a central role in modern machine learning, serving as the engine that enables models to learn from data. This course studies optimization as the foundation of machine learning, beginning with the basics of convex and stochastic optimization and then moving to the training of large-scale neural networks and foundation models. We develop both classical theory and recent insights into training dynamics, scaling behavior, and optimization in overparameterized models.

The course is intended for students who want to pursue research or advanced practice in machine learning, particularly those interested in optimization theory, large-scale model training, and the design of efficient and reliable learning algorithms.

Prerequisites: Students should have a solid background in linear algebra, probability, and calculus. Prior exposure to machine learning and basic optimization (e.g., gradient descent) is expected. Familiarity with Python and PyTorch is recommended for assignments and projects.

Tentative Schedule

Foundations of Optimization
Aug 24 (week 1): Introduction, Course Overview: Optimization for Modern Machine Learning
Aug 31 (week 2): Basic Concepts in Machine Learning and Optimization
Sep 7: Labor Day, No Class
Sep 14 (week 3): Gradient Descent, Nesterov’s Accelerated Gradient Descent
Sep 21 (week 4): Lower Bounds and Complexity of First-Order Methods
Sep 28 (week 5): Proximal Methods and Composite Optimization
Stochastic and First-Order Methods
Oct 5 (week 6): Mirror Descent, Stochastic Gradient Descent (SGD)
Oct 12: Fall Break, No Class
Oct 19 (week 7): Beyond SGD: Adaptive Gradient Methods, Variance Reduction
Nonconvex and Minimax Optimization
Oct 26 (week 8): Large-Scale Training Dynamics: Instability and Large Stepsizes
Nov 2 (week 9): Nonconvex Optimization: Stationary Points and Escaping Saddle Points
Nov 9 (week 10): Minimax Optimization and Applications: Generative Adversarial Networks and Robust Learning
Optimization for Large-Scale Neural Networks
Nov 16 (week 11): Deep Network Optimization: Transformers and Non-Euclidean Geometry-Aware Methods
Nov 23 (week 12): Tensor Programs and Scaling Laws for Neural Networks
Nov 30 (week 13): Final Project Presentation
Dec 7 (week 14): Final Project Presentation

Grading

Project proposal/presentation: 10%
Project report: 20%
Homework: 50% (5 assignments in total)
Scribe notes: 20% (each student signs up to scribe 1–2 lectures)

Honor Code

Please see the Office for Academic Integrity (https://oai.gmu.edu/) for a full description of the code and the honor committee process, and the Honor Code Policies of the Department of Computer Science (https://cs.gmu.edu/resources/honor-code/) regarding the course project. GMU is an Honor Code university. The principle of academic integrity is taken seriously and violations are treated gravely. If you rely on someone else's work in an aspect of the course project, you should give full credit in the proper, accepted form. Another aspect of academic integrity is the free play of ideas. Vigorous discussion and debate are encouraged in this course, with the firm expectation that all aspects of the class will be conducted with civility and respect for differing ideas, perspectives, and traditions. When in doubt (of any kind) please ask for guidance and clarification.