Speaker: Alejandro Delgado Montero

Abstract: This presentation explores the critical challenge of detecting audio deepfakes and defending speaker verification systems against voice spoofing attacks, evaluated within the frameworks of the ASVspoof 5 and ESDD2 international challenges. We examine the design and performance of various detection architectures, highlighting the effectiveness of Residual Networks (ResNets) enhanced with spectral data augmentation, as well as Self-Supervised Learning (SSL) models like Wav2Vec 2.0 combined with attention mechanisms, logistic regression meta-classifiers and model ensembles. Our results demonstrate that these approaches successfully outperformed baseline systems in both competitions, while also exposing the severe performance degradation caused by domain shift between training and evaluation data, thereby underscoring the ongoing need for robust domain invariance techniques in audio security.