Speaker: William Fernando López Gavilánez

Abstract:

Advances in speech synthesis technology have facilitated numerous beneficial applications. However, they also pose significant threats, especially in the realm of identity spoofing. The study explores the potential of leveraging complex spectrograms for real-time classification of synthetic speech. We utilize publicly available datasets, including the MLAAD dataset, to train a Convolutional Neural Network-based system that is designed for real-time synthetic speech classification. The neural network is fed with the complex spectrogram in the form of magnitude and phase. Our approach results in competitive performance on the HABLA Spanish dataset with a network optimized for real-time applications.