Speaker: Laura Herrera. Abstract: In this Final Degree Project, different speech representations, extracted by unsupervised learning, have been used to train a speaker recognition system. In particular, Wav2Vec2.0 and WavLM features have been used as a novelty. The Wav2Vec2.0 features… Read More
End-to-end deep learning models for air traffic control speech recognition
Speaker: Ana Belén Fernández Cordero. Abstract: For many years, Air Traffic Controllers have had to manually type the information they received and transmitted to pilots into the electronic flight strip systems. This time consuming activity contributed to a significant increase… Read More
Efficient Transformers for End-to-End Neural Speaker Diarization
Speaker: Sergio Izquierdo. Abstract: The recently proposed End-to-End Neural speaker Diarization framework (EEND) handles speech overlap and speech activity detection natively. While extensions of this work have reported remarkable results in both two-speaker and multi-speaker diarization scenarios, these come at… Read More
Sound Event Detection in a large-scale audio dataset with multi-resolution neural networks
Speaker: Sara Barahona Quirós. Abstract: Sound event detection is the task that aims to automatize the human’s ability of recognizing sound events in the environment by their particular acoustic information. For this purpose, deep learning techniques are employed to build… Read More
A Speaker Verification Backend with Robust Performance across Conditions
Speaker: Joaquin Gonzalez-Rodriguez. Abstract: Presentation of the paper in https://arxiv.org/abs/2102.01760: L. Ferrer et al. “A Speaker Verification Backend with Robust Performance across Conditions”, 2021. Abstract of the paper (reproduced from the preprint): In this paper, we address the problem of… Read More
Linear-Gaussian Bayesian Network Applications to Forensic Chemistry
Speaker: Elías Hernandis Prieto. Abstract: Forensic evidence evaluation using the likelihood ratio framework requires knowledge about the probability distribution of the data. For evaluating samples of glass remains, this translates to obtaining the joint probability distribution of the relative concentrations… Read More
Improvements in deep learning semi-supervised model selection for the optimization of different Sound Event Detection metrics
Spaker: Cristina Moratilla. Abstract: Sound Event Detection is one of the most developed fields in the area of audio signal processing in the last decades. The objective of such detection is to locate the start and end instants of audio… Read More
Bias analysis in speaker recognition systems based in DNN-embeddings
Speaker: Almudena Aguilera. Abstract: In this study we will evaluate the discriminatory behaviours that are generated in speaker recognition systems, specifically those that verify whether two audios belong to the same speaker or not. These systems work by extracting the… Read More
MetaAudio: A Few-Shot Audio Classification Benchmark
Speaker: David Martín Gutiérrez. Abstract: Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by… Read More
Speaker Diarization, X-vectors with Encoder-Decoder based attractors
Speaker: Juan Ignacio Álvarez Trejos. Abstract: X-Vectors are speaker embeddings that emerge to address the speaker recognition task, surprisingly outperforming i-vectors in most speaker tasks. It is proposed to take advantage of the information contained in these embeddings by using… Read More