Speaker: Sara Barahona Quirós.
Abstract: Speaker diarization of medical conversations presents challenges including spontaneous speech, uneven turn-taking, and speaking style differences between patients and doctors. Track 1 of the DISPLACE-M Challenge addresses this scenario through a dataset of Hindi–English clinical dialogues captured in real-world conditions, incorporating natural code-switching. We present our submission ranked second among all participants, comparing three diarization systems under domain adaptation: Diaper, Pyannote 3.1 and DiariZen, which achieves the strongest performance combining WavLM features with a Conformer network. We further analyze speaker embedding extractors and semi-supervised adaptation via pseudo-label training and mean teacher using unlabeled domain-related data, finding model-dependent effectiveness. System fusion via probabilistic calibration and DOVER-Lap yields consistent gains, with our best single system achieving 8.57% DER on the development test set, reaching 8.48% with fusion.
