Nummer ML-4311	Titel Nonconvex Optimization for Deep Learning	Lehrform(en) Vorlesung, Übung
ECTS	6
Arbeitsaufwand - Kontaktzeit - Selbststudium	Arbeitsaufwand: 180 h Kontaktzeit: 60 h / 4 SWS Selbststudium: 120 h
Veranstaltungsdauer	1 Semester
Häufigkeit des Angebots	Unregelmäßig
Unterrichtssprache	Englisch
Prüfungsform	Klausur (im Falle einer geringen Teilnehmerzahl: mündliche Prüfung)
Inhalt	Website: https://institute-tue.ellis.eu/en/lecture-deep-optimization Note: This lecture does not overlap with "Convex and Nonconvex Optimization." While students are encouraged to take "Convex and Nonconvex Optimization" to solidify their understanding of SGD and basic optimization concepts (duality, interior point methods, constraints), we will only discuss optimization in the context of training deep neural networks and often drift into discussions regarding model design and initialization. Successful training of deep learning models requires non-trivial optimization techniques. This course gives a formal introduction to the field of nonconvex optimization by discussing training of large deep models. We will start with a recap of essential optimization concepts and then proceed to convergence analysis of SGD in the general nonconvex smooth setting. Here, we will explain why a standard nonconvex optimization analysis cannot fully explain the training of neural networks. After discussing the properties of stationary points (e.g., saddle points and local minima), we will study the geometry of neural network landscapes; in particular, we will discuss the existence of "bad" local minima. Next, to gain some insight into the training dynamics of SGD in deep networks, we will explore specific and insightful nonconvex toy problems, such as deep chains and matrix factorization/decomposition/sensing. These are to be considered warm-ups (primitives) for deep learning problems. We will then examine training of standard deep neural networks and discuss the impact of initialization and (over)parametrization on optimization speed and generalization. We will also touch on the benefits of normalization and skip connections. Finally, we will analyze adaptive methods like Adam and discuss their theoretical guarantees and performance on language models. If time permits, we will touch on advanced topics such as label noise, sharpness-aware minimization, neural tangent kernel (NTK), and maximal update parametrization (muP). Prerequisites: The course requires some deep learning familiarity and basic knowledge of gradient-based optimization. Students who have already attended "Convex and Nonconvex Optimization" or any machine learning lecture that discusses gradient descent will have no problem following the lecture. In general, the semester requires good mathematical skills, roughly at the level of the lecture "Mathematics for Machine Learning." In particular, multivariate calculus and linear algebra are needed.
Qualifikationsziele	The objective is to provide the student with an understanding of modern neural network training pipelines. After the lecture, they will have known both the theoretical foundations of non-convex optimization and the main ideas behind the successful training of deep learning models.
Vergabe von Leistungspunkten/Benotung	Lehrform Status SWS LP Prüfungsform Prüfungsdauer Benotung Berechnung Modulnote (%)
Teilnahmevoraussetzungen	Es gibt keine besonderen Voraussetzungen.
Dozent/in	Orvieto
Literatur / Sonstiges	Here are a few crucial papers discussed in the lecture (math will be greatly simplified): https://arxiv.org/abs/1605.07110, https://arxiv.org/pdf/1802.06509, https://arxiv.org/abs/1812.0795, https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf,https://arxiv.org/abs/1502.01852, https://arxiv.org/pdf/2402.16788v1
Zuletzt angeboten	nicht bekannt
Geplant für	Wintersemester 2025
Zugeordnete Studienbereiche	INFO-INFO, INFO-THEO, MEDI-APPL, MEDI-INFO, ML-CS, ML-DIV

Nummer

ML-4311

Titel

Nonconvex Optimization for Deep Learning

Lehrform(en)

Vorlesung, Übung

ECTS

Arbeitsaufwand
- Kontaktzeit
- Selbststudium

Arbeitsaufwand:
180 h

Kontaktzeit:
60 h / 4 SWS

Selbststudium:
120 h

Veranstaltungsdauer

1 Semester

Häufigkeit des Angebots

Unregelmäßig

Unterrichtssprache

Englisch

Prüfungsform

Klausur (im Falle einer geringen Teilnehmerzahl: mündliche Prüfung)

Inhalt

Website: https://institute-tue.ellis.eu/en/lecture-deep-optimization

Note: This lecture does not overlap with "Convex and Nonconvex Optimization." While students are encouraged to take "Convex and Nonconvex Optimization" to solidify their understanding of SGD and basic optimization concepts (duality, interior point methods, constraints), we will only discuss optimization in the context of training deep neural networks and often drift into discussions regarding model design and initialization.

Successful training of deep learning models requires non-trivial optimization techniques. This course gives a formal introduction to the field of nonconvex optimization by discussing training of large deep models. We will start with a recap of essential optimization concepts and then proceed to convergence analysis of SGD in the general nonconvex smooth setting. Here, we will explain why a standard nonconvex optimization analysis cannot fully explain the training of neural networks. After discussing the properties of stationary points (e.g., saddle points and local minima), we will study the geometry of neural network landscapes; in particular, we will discuss the existence of "bad" local minima.

Next, to gain some insight into the training dynamics of SGD in deep networks, we will explore specific and insightful nonconvex toy problems, such as deep chains and matrix factorization/decomposition/sensing. These are to be considered warm-ups (primitives) for deep learning problems. We will then examine training of standard deep neural networks and discuss the impact of initialization and (over)parametrization on optimization speed and generalization. We will also touch on the benefits of normalization and skip connections.

Finally, we will analyze adaptive methods like Adam and discuss their theoretical guarantees and performance on language models. If time permits, we will touch on advanced topics such as label noise, sharpness-aware minimization, neural tangent kernel (NTK), and maximal update parametrization (muP).

Prerequisites:
The course requires some deep learning familiarity and basic knowledge of gradient-based optimization. Students who have already attended "Convex and Nonconvex Optimization" or any machine learning lecture that discusses gradient descent will have no problem following the lecture. In general, the semester requires good mathematical skills, roughly at the level of the lecture "Mathematics for Machine Learning." In particular, multivariate calculus and linear algebra are needed.

Qualifikationsziele

The objective is to provide the student with an understanding of modern neural network training pipelines. After the lecture, they will have known both the theoretical foundations of non-convex optimization and the main ideas behind the successful training of deep learning models.

Vergabe von Leistungspunkten/Benotung

Teilnahmevoraussetzungen

Es gibt keine besonderen Voraussetzungen.

Dozent/in

Orvieto

Literatur / Sonstiges

Here are a few crucial papers discussed in the lecture (math will be greatly simplified):
https://arxiv.org/abs/1605.07110, https://arxiv.org/pdf/1802.06509,
https://arxiv.org/abs/1812.0795,
https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf,https://arxiv.org/abs/1502.01852,
https://arxiv.org/pdf/2402.16788v1

Zuletzt angeboten

nicht bekannt

Geplant für

Wintersemester 2025

Zugeordnete Studienbereiche

INFO-INFO, INFO-THEO, MEDI-APPL, MEDI-INFO, ML-CS, ML-DIV