Calibration of Multiclass Classifiers: A Wee Tutorial.

Speaker: Daniel Ramos Castro.

Abstract: Probabilistic predictions are vital for decision-making in many applications of machine learning and AI, including medicine, forensics, security, and safety. However, many multiclass classifiers produce poorly calibrated outputs, leading to suboptimal decisions with potentially high expected costs. This tutorial introduces the principles and practice of calibration. Starting from the basics in binary classification, we will provide an overview of the fundamentals of multiclass decision-making. We will also critically discuss key metrics such as the Expected Calibration Error and proper scoring rules, and review modern post-hoc calibration methods, including temperature scaling and direction-preserving transformations.