Module Number

ML-4331
Module Title

The Science of Machine Learning Benchmarks
Lecture Type(s)

Lecture, Tutorial
ECTS 6
Work load
- Contact time
- Self study
Workload:
180 h
Class time:
60 h / 4 SWS
Self study:
120 h
Duration 1 Semester
Frequency Irregular
Language of instruction English
Type of Exam

Written exam

Content

Benchmarks have played a central role in the progress of machine learning research since the 1980s. Although there's much researchers have done with them, we still know little about how and why benchmarks work. This class covers the emerging science of benchmarks. The first part focuses on laying the theoretical and empirical foundations that we build on throughout the class. The second part covers lessons about reliability and validity we draw from influential benchmarks, such as ImageNet. The final part turns to benchmarking and evaluation in the era of large language models.

Students who would like to attend this course should meet the following requirements:
Comfort with undergraduate probability, statistics, and machine learning theory; proficiency with the Python machine learning ecosystem, including PyTorch, Sklearn, HuggingFace, etc.

Objectives

Working from first principles, the aim is to better understand why and when benchmarks work, how they fail, and how to best evaluate machine learning models. At the end of the class, students have a good understanding of machine learning benchmarks and the surrounding evaluation ecosystem. They can follow best practices in the evaluation of machine learning. They are able to identify and avoid pitfalls.

Allocation of credits / grading
Type of Class
Status
SWS
Credits
Type of Exam
Exam duration
Evaluation
Calculation
of Module (%)
Prerequisite for participation There are no specific prerequisites.
Lecturer / Other MPI
Literature

-

Last offered unknown
Planned for Wintersemester 2024
Assigned Study Areas