Speaker: Javier Tejedor Noguerales Abstract: Spoken term detection is the task of detecting terms (sequence of words) within audio archives. This task is suitable for accessing the information stored in audio repositories. This talk will present a spoken term detection… Read More
Speech Recognition Inequities: Analyzing Dialect, Gender, and Skin Tone Biases in ASR Models
Speaker: Pilar Fernández Gallego Abstract: Automatic Speech Recognition (ASR) systems have exhibited notable disparities in performance across demographic groups, raising important concerns about fairness in AI technologies. Recent investigations have synthesized findings on gender, dialect, and skin tone biases within… Read More
Use of AI in Videogames
Speaker: Adrián Aranda Márquez Abstract: This presentation offers an overview of the use of Artificial Intelligence in the video game industry. Unlike AI in other fields that seeks to optimize results, the main objective of AI in videogames is the… Read More
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Speaker: Sara Barahona Quirós. Abstract: Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM)… Read More
Feature Integration and Model Fusion Strategies for Neural Speaker Diarization in Conversational Telephone Speech
Speaker: Juan Ignacio Álvarez Trejos Abstract: This talk presents a comprehensive investigation into optimizing end-to-end neural diarization systems for conversational telephone speech through advanced feature integration and model fusion techniques. We introduce a systematic methodological framework for analyzing and integrating… Read More
Language-Based Audio Retrieval (DCASE Evaluations)
Speaker: Doroteo Torre Toledanos. Abstract: Language-based audio retrieval is the task of retrieving audio segments containing sound described in a natural language text. This task was first proposed in a DCASE Challenge in 2022 as a subtask of the audio… Read More
Development of a Guardrail System for Bank Movement Assistant
Speaker: Miguel Ángel Martínez Pay. Abstract: This seminar outlines the process of creating a guardrail for a banking transactions assistant. The guardrail acts as a security system that filters user queries, determining which can be processed by the assistant and… Read More
Neural Discrete Representation Learning Revisited: Applications of VQ-VAE
Speaker: Manuel Fernando Mollón Laorca. Abstract: Since the publication of Neural Discrete Representation Learning in 2018, Vector Quantized Variational Autoencoders (VQ-VAEs) have gained significant attention for their ability to bridge continuous and discrete representations. In particular, their integration with transformer… Read More
You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
Speaker: Rosa María Hornero Romera. Abstract: Presentation of paper https://arxiv.org/abs/2109.00962 Audio segmentation and sound event detection are essential aspects of machine listening, focusing on identifying acoustic classes and their boundaries. These tasks play a key role in applications such as… Read More
The Expected Cost: One Performance Metric to Rule Them All
Speaker: Daniel Ramos Castro. Abstract: Based on https://openreview.net/forum?id=3mN9QNWArl. Abstract of original paper: “The expected cost (EC) is one of the main classification metrics introduced in statistical and machine learning books. It is based on the assumption that, for a given… Read More