Speaker: María Pilar Fernández Gallego. Abstract: Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is… Read More
A Whisper-based Query-by-Example Spoken Term Detection approach for search on speech
Speaker: Javier Tejedor Noguerales. Abstract: Nowadays, in the digital era, the amount of information stored in audio repositories is undoubtedly growing. This makes necessary the development of efficient and automatic methods to search on audio content. To address it, search… Read More
NIST 2024 Speaker Recognition Evaluation
Speaker: Sara Barahona Quirós. Abstract: In this talk we will present our paritcipation to the NIST 2024 SRE Evaluation in collaboration with Brno University of Technology, Polito, Phonexia, Omilia and CRIM. This evaluation focuses on speaker detection over conversational telephone… Read More
Foundational Models for Self-Supervised Speaker Diarization and Target Speaker ASR
Speaker: Alicia Lozano-Diez. Abstract: In this talk, I will review a few of the last trends in speaker diarization and target speaker ASR. I will present two papers that address these two tasks respectively, and leverage the power of foundational… Read More
What can LLMs bring to the field of acoustic event detection?
Speaker: Sergio Segovia González. Abstract: The answer to this question through these two articles, “WILDDESED: An LLM-POWERED dataset for wild domestic environment sound event detection system” and “Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection” has been carried… Read More
Exploring Large Protein Language Models in Constrained Data Regimes
Speaker: Manuel Fernando Mollon Laorca. Abstract: In this study, we expand upon the FLIP benchmark—designed for evaluating protein language models (pLMs) in small, specialized prediction tasks—by assessing the performance of state-of-the-art models, including ESM-2, SaProt, and Tranception, on the FLIP… Read More
Fusion-Based Speaker Diarization: Insights from IberSpeech2024
Speaker: Juan Ignacio Álvarez Trejos. Abstract: This talk presents the results of our participation in the speaker diarization challenge at IberSpeech2024. Our approach combines the strengths of three diarization models: a custom-trained Diaper model, Pyannote, and VBx, through an innovative… Read More
Device-robust audio classification
Speaker: Wiliam Fernando López Gavilánez. Abstract: Audio classifiers designed for deployment across diverse devices often face unforeseen conditions during inference, attributable to device-specific characteristics. These challenges stem from variations in microphone transfer functions or on-chip digital signal pre-processing, which result… Read More
Analyzing DiaPer EEND Speaker Diarization Models on the RTVE2022 Dataset
Speaker: Juan Ignacio Álvarez Trejos. Abstract: The task of speaker diarization has lately been successfully tackled with end-to-end neural diarization (EEND) models instead of modular cascaded ones. Among them, the very new EEND Perceiver-based attractors (DiaPer) comes with a light… Read More
Analysis of Speaker Label Matching for Diarization of Long Audios on RTVE2022 Dataset
Speaker: Laura Herrera Alarcón. Abstract: This study introduces an algorithm to match predicted speaker labels from short audio segments into a final prediction. This involves extracting an x-vector for each speaker in each segment and applying constrained Agglomerative Clustering to… Read More