Speaker: Juan Ignacio Álvarez Trejos
Abstract:
This talk presents a comprehensive investigation into optimizing end-to-end neural diarization systems for conversational telephone speech through advanced feature integration and model fusion techniques. We introduce a systematic methodological framework for analyzing and integrating diverse acoustic features beyond traditional Mel-filterbanks as input to the End-to-End Neural Diarization with Encoder Decoder Attractors (EEND-EDA) model, with particular focus on ECAPA-TDNN embeddings and Geneva Minimalistic Acoustic Parameter Sets (GeMAPS). Our approach encompasses systematic feature analysis combined with adaptation strategies, including speaker-count restriction and regularization techniques. The presentation extends beyond feature integration to explore model fusion and calibration strategies, including the transition from multilabel to powerset approaches for two-speaker scenarios.