Module Number ML-4331	Module Title The Science of Machine Learning Benchmarks	Lecture Type(s) Lecture, Tutorial
ECTS	6
Work load - Contact time - Self study	Workload: 180 h Class time: 60 h / 4 SWS Self study: 120 h
Duration	1 Semester
Frequency	Irregular
Language of instruction	English
Type of Exam	Written exam
Content	Benchmarks have played a central role in the progress of machine learning research since the 1980s. Although there's much researchers have done with them, we still know little about how and why benchmarks work. This class covers the emerging science of benchmarks. The first part focuses on laying the theoretical and empirical foundations that we build on throughout the class. The second part covers lessons about reliability and validity we draw from influential benchmarks, such as ImageNet. The final part turns to benchmarking and evaluation in the era of large language models. Students who would like to attend this course should meet the following requirements: Comfort with undergraduate probability, statistics, and machine learning theory; proficiency with the Python machine learning ecosystem, including PyTorch, Sklearn, HuggingFace, etc.
Objectives	Working from first principles, the aim is to better understand why and when benchmarks work, how they fail, and how to best evaluate machine learning models. At the end of the class, students have a good understanding of machine learning benchmarks and the surrounding evaluation ecosystem. They can follow best practices in the evaluation of machine learning. They are able to identify and avoid pitfalls.
Allocation of credits / grading	Type of Class Status SWS Credits Type of Exam Exam duration Evaluation Calculation of Module (%)
Prerequisite for participation	There are no specific prerequisites.
Lecturer / Other	Hardt, MPI
Literature	-
Last offered	unknown
Planned for	Wintersemester 2024
Assigned Study Areas	INFO-INFO, MEDI-APPL, MEDI-INFO, ML-CS, ML-DIV

Module Number

ML-4331

Module Title

The Science of Machine Learning Benchmarks

Lecture Type(s)

Lecture, Tutorial

ECTS

Work load
- Contact time
- Self study

Workload:
180 h

Class time:
60 h / 4 SWS

Self study:
120 h

Duration

1 Semester

Frequency

Irregular

Language of instruction

English

Type of Exam

Written exam

Content

Benchmarks have played a central role in the progress of machine learning research since the 1980s. Although there's much researchers have done with them, we still know little about how and why benchmarks work. This class covers the emerging science of benchmarks. The first part focuses on laying the theoretical and empirical foundations that we build on throughout the class. The second part covers lessons about reliability and validity we draw from influential benchmarks, such as ImageNet. The final part turns to benchmarking and evaluation in the era of large language models.

Students who would like to attend this course should meet the following requirements:
Comfort with undergraduate probability, statistics, and machine learning theory; proficiency with the Python machine learning ecosystem, including PyTorch, Sklearn, HuggingFace, etc.

Objectives

Working from first principles, the aim is to better understand why and when benchmarks work, how they fail, and how to best evaluate machine learning models. At the end of the class, students have a good understanding of machine learning benchmarks and the surrounding evaluation ecosystem. They can follow best practices in the evaluation of machine learning. They are able to identify and avoid pitfalls.

Allocation of credits / grading

Prerequisite for participation

There are no specific prerequisites.

Lecturer / Other

Hardt, MPI

Literature

Last offered

unknown

Planned for

Wintersemester 2024

Assigned Study Areas

INFO-INFO, MEDI-APPL, MEDI-INFO, ML-CS, ML-DIV