DiarizationLM: speaker diarization post-processing with large language models

Speaker: Laura Herrera Alarcón.

Abstract: This paper presents a framework designed to post-process the outputs of speaker diarization systems using large language models (LLM). The framework aims to enhance the readability of the diarized transcripts and reduce the WDER. For this purpose, it uses a compact textual format, which is then used as input for optionally fine-tuned LLMs. The refined outputs from the LLM serve as improved diarization results. Experiments on the Fisher and Callhome datasets demonstrate that a finetuned PaLM 2-S model can drastically reduce the word diarization error rates of typical diarization systems, such as Turn-to-diarize.