Speaker: Alicia Lozano Diez Abstract: In this talk, I will describe new approaches to the task of speaker diarization based on end-to-end neural networks, which present several advantages with respect to traditional systems based on clustering of speaker embeddings. I… Read More
Normalizing Flows for calibration of multiclass probabilistic classifiers
Speaker: Sergio Márquez Abstract: Today’s Deep Neural Networks (DNNs) have achieved high performance in accuracy, far exceeding the ones used ten years ago. Nevertheless, the outputs provided by these modern networks are less well calibrated, becoming a major problem in… Read More
Transfer Learning from computer vision to audio event detection
Speaker: Sergio Segovia Abstract: A brief summary about my lecture, in relation to my doctorate we are exploring the idea of applying the transfer learning technique between the domain of computer vision to the objective of detecting acoustic events. The… Read More
Modeling Uncertainty with Bayesian Neural Networks
Speaker: Sergio Álvarez Abstract: Deep Neural Networks (DNNs) have revolutionized many fields in pattern recognition like speech recognition and object detection. There are, however, some applications in which Neural Networks struggle to offer competitive performance, mainly sensitive ones. These applications… Read More
New loss function to improve calibration with mixup
Speaker: Juan Maroñas Molano Abstract: Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and… Read More
Self-supervised deep learning approaches for speaker recognition
Speaker: Joaquín González Abstract: In this talk I will review the thesis “Self-supervised deep learning approaches for speaker recognition” presented by Umair Khan at the UPC (Universidad Politecnica de Cataluña) in January 2021, directed by Javier Hernando. In this thesis… Read More
Data augmentation for improved robustness against packet losses in ASR
Speaker: María Pilar Fernández Gallego Abstract: Nowadays a large amount of companies record conversations, calls, sales or even meetings, in many cases to comply with the current legislation. Apart from the legal need, these recordings constitute an invaluable source of… Read More
End-to-end Query-by-example Spoken Term Detection
Speaker: Juan Ignacio Álvarez Trejos Abstract: Query-by-example Spoken Term Detection (QbE-STD) is a keytechnology to harness the large amount of audiovisual content that is being stored and generated nowadays. Using audio example queries for STD has several advantages such as… Read More
AUDIAS-UAM System for the Albayzin 2020 Speech to Text Challenge
Speaker: Beltrán Labrador Serrano Abstract: This presentation describes the system submitted by the AUDIAS-UAM team for the Albayzin 2020 Speech to Text Challenge. Our system is an end to end Transformer-based system built using ESPnet Toolkit. The acoustic model is… Read More
Multi-resolution Sound Event Detection
Speaker: Diego de Benito Gorrón Abstract: The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. Over the recent years, this field is holding a rising relevance due to the introduction of datasets… Read More