Seminars – AUDIAS-UAM

Adapting Speaker Diarization to Code-Switched Medical Conversations: AUDIAS-UAM at the DISPLACE-M ChallengeJune 19, 2026
Speaker: Sara Barahona Quirós. Abstract: Speaker diarization of medical conversations presents challenges including spontaneous speech, uneven turn-taking, and speaking style differences between patients and doctors. Track 1 of the DISPLACE-M Challenge addresses this scenario through a dataset of Hindi–English clinical… Read More
Responsible AI for forensic science with non-human biological findings in the Natural Traces ProjectJune 19, 2026
Speaker: Manuel Fernando Mollón Laorca Abstract: The growing adoption of AI in forensic science demands high performance, interpretability, robustness, and transparency. This research, part of the Horizon Europe Natural Traces Project (https://naturaltraces.com) advances responsible AI through two key forensic applications.… Read More
Seeing Sound: From Computer Vision to Sound Event DetectionJune 12, 2026
Speaker: Sergio Segovia González. Abstract: This talk presents the trajectory of my PhD from image- and video-based AI to its later transfer into audio and Sound Event Detection. The central idea is how visual perception methods can inspire audio event… Read More
Large-scale evaluation of P300 BCI systems on BigP3BCIJune 5, 2026
Speaker: Álvaro Sáiz López. Abstract: P300-based brain-computer interfaces (BCIs) provide a non-muscular communication channel for patients with severe motor impairments. This work leverages BigP3BCI, a recently released dataset unifying 18 studies and ~200 subjects, to systematically compare feature extractors for… Read More
Evaluation of P300-Based Brain-Computer Interfaces in Amyotrophic Lateral SclerosisJune 5, 2026
Speaker: Julia Reina Boria. Abstract: In this talk, I will present the evaluation of P300-based brain-computer interfaces as an assistive communication technology for people with amyotrophic lateral sclerosis. The work analyzes EEG signals from a P300 dataset and compares different… Read More
Automatic metal subgenre recognition systemJune 2, 2026
Speaker: Alejandro André Vivas Freitas. Abstract: Music can evoke countless emotions regardless of culture or age, and although the classification of musical genres has existed for centuries, automatic classification is a relatively recent discipline (barely two and a half decades… Read More
Audio Event Processing applied to the detection of different frog species.June 1, 2026
Speaker: Nicolás Martín Ansorregui. Abstract: Passive Acoustic Monitoring (PAM) in tropical ecosystems faces significant challenges due to overlapping vocalizations and severe class imbalance, often leading to the ‘algorithmic invisibility’ of rare species. This talk presents a deep learning architecture that… Read More
Analysis of Deepfakes and Anti-spoofing in Speaker VerificationJune 1, 2026
Speaker: Alejandro Delgado Montero Abstract: This presentation explores the critical challenge of detecting audio deepfakes and defending speaker verification systems against voice spoofing attacks, evaluated within the frameworks of the ASVspoof 5 and ESDD2 international challenges. We examine the design… Read More
Review of the paper “CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions”May 29, 2026
Speaker: Clara Adsuar Ávila. Abstract of the paper: We demonstrate that carefully adjusting the tokenizer of the Whisper speech recognition model significantly improves the precision of word-level timestamps when applying dynamic time warping to the decoder’s cross-attention scores. We finetune… Read More
TabImpute: The Quest to Improving Regression Results with Data ImputationMay 22, 2026
Speaker: Santiago Rattenbach Paliza Bartolomé. Abstract: In this talk, an innovative approach to data imputation with TabImpute is presented, leveraging the regression capabilities of TabPFN. Specifically, the comparison between TabImpute and the previous approach to data imputation using TabPFN is… Read More
Predictive Modeling of Gene Regulation: From Basic Genetics to Multi-Omic Machine LearningMay 8, 2026
Speaker: Alfonso Rincón Pérez. Abstract: Understanding the underlying genetic and epigenetic mechanisms that govern gene expression remains a fundamental challenge in molecular biology. This seminar presents a comprehensive data-driven approach to deciphering these regulatory layers, starting from the foundational genetics… Read More
From Clinical Prediction to Audio Analysis: Sleep Apnea Prediction and use of Audio-LLMs for speech diarizationApril 24, 2026
Speaker: Unax Murua Urizarbarrena. Abstract: This presentation covers two different projects using data and speech analysis. The first half focuses on building a machine learning model to predict sleep apnea within the INSPIRA project (Colaboration Network funded by the Comunidad… Read More
From Coding to Orchestration: Generative AI and the Changing Nature of Software Engineering ExpertiseApril 17, 2026
Speaker: Pilar Fernández Gallego. Abstract: Debates about the future of generative AI increasingly focus on its capacity to reshape economic and social structures by making complex, effort-intensive tasks more accessible, automated, and scalable. As with previous technological revolutions, these shifts… Read More
DISPLACE-M 2026 Challenge and the AUDIAS-UAM Speaker Diarization SystemsApril 10, 2026
Speaker: Alicia Lozano Díez. Abstract: In this talk, I will present the main challenges faced when addressing the recent DISPLACE-M 2026 challenge. In particular, I will focus on Track 1, which addresses speaker diarization on a multilingual, code-mixed conversations between… Read More
DNA Cryptography, Fish Welfare, and HIF Binding Prediction. Three Topics, No Obvious ConnectionMarch 27, 2026
Speaker: Elias Boudjella. Abstract: This presentation brings together intentionally distinct topics spanning computational biology and experimental systems. We first explore DNA-based cryptography through the use of a one-time pad, leveraging synthetic DNA as a medium for secure information storage and… Read More
Machine Learning for Hematological Data Modeling in Platelet-Rich PlasmaMarch 20, 2026
Speaker: Pedro Martí Picó Abstract: This presentation shares research results regarding the use of machine learning models for modeling Platelet-Rich Plasma data, conducted within the framework of the UAM-BioSmartData Chair.
Benchmarking Automatic Speech Recognition Tools for a Set of LanguagesMarch 13, 2026
Speaker: William Fernando López Gavilánez. Abstract: Automatic speech recognition has advanced rapidly in recent years, yet fine-grained evaluation for Iberian languages remains limited. In this work, we benchmark several open-weights speech-to-text models across multiple Iberian languages (Basque, Catalan, Galician, Portuguese,… Read More
Kolmogorov-Arnold Networks (KANs)March 6, 2026
Speaker: Doroteo Torre-Toledano. Abstract: The Kolmogorov-Arnold Representation Theorem, KART (1957-58), establishes that any multidimensional function can be expressed in terms of a finite set of binary additions and unidimensional functions. In practice this means that the only real multidimensional function… Read More
Distributed Acoustic Sensing and Artificial Intelligence for pipeline integrity threat detectionFebruary 20, 2026
Speaker: Javier Tejedor Noguerales. Abstract: This talk presents an advanced system for the continuous monitoring of potential threats in a long gas pipeline. For signal acquisition, phase-sensitive optical time domain reflectometry technology is employed. Then, pattern recognition strategies are incorporated,… Read More
TidyVoice Challenge: Cross-Lingual Speaker VerificationFebruary 13, 2026
Speaker: Sara Barahona Quirós. Abstract: Cross-lingual speaker verification remains a challenging problem due to variability in language, recording conditions, and speaker characteristics. In this talk, we present our participation in the new benchmark TidyVoice Challenge: Cross-Lingual Speaker Verification, associated with… Read More
An Introduction to Ethics, Risks and Safety in Artificial IntelligenceFebruary 6, 2026
Speaker: Joaquín González Rodríguez. Abstract: With the omnipresence of AI in the media, it has become increasingly difficult to separate relevant information from background noise, with predictions ranging from utopian promises to apocalyptic scenarios. Moreover, much of the existing scientific… Read More
Web Pentesting: How and Why Modern Applications BreakJanuary 30, 2026
Speaker: Pablo González Escribano. Abstract: This talk introduces web pentesting, how it is performed, and why it remains relevant in modern applications. It discusses common assumptions about web security, professional pentesting methodologies, and analyzes real-world vulnerabilities such as IDOR, SQL… Read More
Summary of the CIFE 2026 (Congreso Internacional de Fonética Experimental)January 22, 2026
Speaker: Manuel Otero González. Abstract: This presentation explained the development of the CIFE 2026 (Congreso Internacional de Fonética Experimental) and provided an overview of the various sessions attended. Some of these sessions explored the content presented and the lines of… Read More
A Perceptually-Optimised Hybrid Acoustic Simulation for Interactive EnvironmentsDecember 11, 2025
Speaker: Maya Tia Kanani. Abstract: Accurate simulation of sound propagation is an ongoing problem. Current approaches are computationally demanding, especially in virtual environments where it is often necessary to consider complex geometries, real-time rendering and dynamic scenes. Whilst existing hybrid… Read More
Transformer Embeddings Enable Accurate Forensic Classification of Soil DNANovember 27, 2025
Speaker: Manuel Fernando Mollón Laorca. Abstract: In this study, we use transformer-based DNA embeddings from a pre-trained DNABERT-2 model to train a classifier that distinguishes soil evidence containing blood, urine, feces, cadaver material, or no biological material (control). We systematically… Read More
Automatic Speech Recognition and Speaker Diarization in Spanish: Dialectal Varieties and Rural SpeechNovember 20, 2025
Speaker: Laura Herrera Alarcón Abstract: Audio databases provide essential linguistic information, capturing verbal content as well as sociolinguistic features. The COSER corpus (Audible Corpus of Spoken Rural Spanish) is introduced. In addition, training and test splits have been created to… Read More
Anatomy of a Deepfake: Types of Attacks and Detection EvaluationsNovember 13, 2025
Speaker: Manuel Otero González. Abstract: This talk will explore the world of deepfakes: what they are, how they work, and why they have become one of the most debated technologies in recent years. Starting with a brief personal introduction, we… Read More
Mic-e-mouse and the new AI-based Ciber AttacksNovember 6, 2025
Speaker: Juan Antonio Gordillo Gayo. Abstract: The vast amount of data generated by individuals today has fueled the rise of AI-driven technologies that make our lives more convenient and efficient. However, these same tools also create unprecedented opportunities for hackers… Read More
Calibration of Multiclass Classifiers: A Wee Tutorial.October 30, 2025
Speaker: Daniel Ramos Castro. Abstract: Probabilistic predictions are vital for decision-making in many applications of machine learning and AI, including medicine, forensics, security, and safety. However, many multiclass classifiers produce poorly calibrated outputs, leading to suboptimal decisions with potentially high… Read More
Titans: Learning to Memorize at Test TimeOctober 23, 2025
Speaker: Adrián Aranda Márquez. Abstract: This presentation provides an in-depth analysis of the paper Titans: Learning to Memorize at Test Time, which proposes a novel neural architecture designed to enhance long-term contextual learning in sequence modeling. The authors introduce the Titans… Read More
Calibration and Fusion of End-to-End Neural Diarization Models: A Comprehensive FrameworkOctober 16, 2025
Speaker: Sergio Álvarez Balanya Abstract: End-to-End Neural Diarization (EEND) systems produce frame-level probabilistic speaker activity estimates, yet the reliability of these confidence scores remains largely unexplored. Unlike hard-decision fusion approaches such as DOVER-Lap, working with continuous probability outputs enables more… Read More
YOLO-based Transfer Learning for Acoustic Event Detection using Visual Object Detection TechniquesOctober 9, 2025
Speaker: Sergio Segovia González. Abstract: Traditional SED approaches are based on either specialized models or on these models in combination with general audio embedding extractors. In this article we propose to reframe SED as an object detection task in the… Read More
Auditory General Intelligence (JSALT-2025)September 25, 2025
Speaker: Laura Herrera Alarcón. Abstract: The emergence of Large Audio Language Models (LALMs) has expanded the ability of LLMs to understand and reason over audio. In response, new benchmarks have been introduced to measure these capabilities. Yet, most rely on… Read More
Fitting Protein Language Models (PLMs) for the prediction of protein functionality using zero-shot and few-shot techniques.September 15, 2025
Speaker: Juan Antonio Gordillo Gayo. Abstract: The unprecedent success of deep learning has driven unprecedented progress across many scientific domains, solving tasks long considered intractable with traditional methods. A remarkable example is AlphaFold, which made it possible to predict protein… Read More
Open science in the service of conservation: An accessible, user-friendly machine learning workflow for automated anuran monitoring in complex Neotropical soundscapesSeptember 10, 2025
Speaker: Gabriel Bidart Abstract: Amphibian populations worldwide are declining, particularly in biodiversity hotspots such as the Neotropics, posing urgent conservation challenges. Acoustic monitoring offers a non-invasive tool for tracking amphibian presence and activity, but large-scale audio datasets pose bottlenecks. We… Read More
Introduction to Protein Language Models: biological concepts and computational toolsJuly 9, 2025
Speaker: Juan Antonio Gordillo Gayo. Abstract: Proteins are the main executors of life: they catalyze reactions, transmit signals, structure tissues, and regulate essential cellular processes. Their function is intimately determined by the sequence of amino acids that compose them, which… Read More
Optimization of a Deep Learning Model for DNA Analysis under Hypoxemic ConditionsJuly 2, 2025
Speaker: Paloma Villanueva Fuster. Abstract: This study focuses on predicting hypoxia-inducible factor (HIF) binding sites, which are critical regulators of the cellular response to low oxygen levels and are implicated in various diseases, including cancer. Using DNABERT-2, a Transformer-based language… Read More
Detection and classification of plants and their condition based on ultrasound patterns generated under abiotic stressJune 25, 2025
Speaker: Fernando David Modrego Arceo Abstract: Plant bioacoustics is an emerging field that suggests that plants not only respond to sound stimuli but also emit detectable acoustic signals, particularly under stress conditions. Recent studies have revealed airborne ultrasonic emissions produced… Read More
Spoken term detection on COSER corpusJune 11, 2025
Speaker: Javier Tejedor Noguerales Abstract: Spoken term detection is the task of detecting terms (sequence of words) within audio archives. This task is suitable for accessing the information stored in audio repositories. This talk will present a spoken term detection… Read More
Speech Recognition Inequities: Analyzing Dialect, Gender, and Skin Tone Biases in ASR ModelsMay 28, 2025
Speaker: Pilar Fernández Gallego Abstract: Automatic Speech Recognition (ASR) systems have exhibited notable disparities in performance across demographic groups, raising important concerns about fairness in AI technologies. Recent investigations have synthesized findings on gender, dialect, and skin tone biases within… Read More
Use of AI in VideogamesMay 21, 2025
Speaker: Adrián Aranda Márquez Abstract: This presentation offers an overview of the use of Artificial Intelligence in the video game industry. Unlike AI in other fields that seeks to optimize results, the main objective of AI in videogames is the… Read More
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning AbilitiesMay 14, 2025
Speaker: Sara Barahona Quirós. Abstract: Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM)… Read More
Feature Integration and Model Fusion Strategies for Neural Speaker Diarization in Conversational Telephone SpeechMay 7, 2025
Speaker: Juan Ignacio Álvarez Trejos Abstract: This talk presents a comprehensive investigation into optimizing end-to-end neural diarization systems for conversational telephone speech through advanced feature integration and model fusion techniques. We introduce a systematic methodological framework for analyzing and integrating… Read More
Language-Based Audio Retrieval (DCASE Evaluations)April 30, 2025
Speaker: Doroteo Torre Toledanos. Abstract: Language-based audio retrieval is the task of retrieving audio segments containing sound described in a natural language text. This task was first proposed in a DCASE Challenge in 2022 as a subtask of the audio… Read More
Development of a Guardrail System for Bank Movement AssistantApril 22, 2025
Speaker: Miguel Ángel Martínez Pay. Abstract: This seminar outlines the process of creating a guardrail for a banking transactions assistant. The guardrail acts as a security system that filters user queries, determining which can be processed by the assistant and… Read More
Neural Discrete Representation Learning Revisited: Applications of VQ-VAEApril 9, 2025
Speaker: Manuel Fernando Mollón Laorca. Abstract: Since the publication of Neural Discrete Representation Learning in 2018, Vector Quantized Variational Autoencoders (VQ-VAEs) have gained significant attention for their ability to bridge continuous and discrete representations. In particular, their integration with transformer… Read More
You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event DetectionMarch 26, 2025
Speaker: Rosa María Hornero Romera. Abstract: Presentation of paper https://arxiv.org/abs/2109.00962 Audio segmentation and sound event detection are essential aspects of machine listening, focusing on identifying acoustic classes and their boundaries. These tasks play a key role in applications such as… Read More
The Expected Cost: One Performance Metric to Rule Them AllMarch 12, 2025
Speaker: Daniel Ramos Castro. Abstract: Based on https://openreview.net/forum?id=3mN9QNWArl. Abstract of original paper: “The expected cost (EC) is one of the main classification metrics introduced in statistical and machine learning books. It is based on the assumption that, for a given… Read More
Cybersecurity Today: Attackers and DefendersMarch 5, 2025
Speaker: Pablo González Escribano. Abstract: As cyber threats continue to evolve at a rapid pace, understanding the tactics, techniques, and procedures (TTPs) employed by attackers is crucial for enhancing defense strategies. In this session, we explored the current landscape of… Read More
Real-time Detection of Synthetic SpeechFebruary 26, 2025
Speaker: William Fernando López Gavilánez Abstract: Advances in speech synthesis technology have facilitated numerous beneficial applications. However, they also pose significant threats, especially in the realm of identity spoofing. The study explores the potential of leveraging complex spectrograms for real-time… Read More
Past, present and ¿future? of Scaling Laws for Neural Language ModelsFebruary 19, 2025
Speaker: Beltrán Labrador Abstract: This presentation examines the scaling laws for neural networks that were foundational to the development of modern, large-scale language models. It revisits the 2020 OpenAI paper that established a key principle: model performance scales predictably with… Read More
Emotion Recognition Based On Speech Analysis For The EmoSPeech 2024 ChallengeFebruary 12, 2025
Speaker: Manuel Otero Abstract: This master’s thesis addresses the analysis and recognition of emotions in speech, within the framework of the EmoSPeech 2024 challenge. Different approaches to the state of the art are presented, from traditional methods to current models… Read More
Deep Learning Insights Inspired by Reinforcement Learning ResearchFebruary 5, 2025
Speaker: Tamas Endrei. Abstract: Despite deep reinforcement learning being around for more than 10 years, traditional deep learning best practices have largely avoided the field until now. This talk elaborates on deep learning techniques uncovered through RL-motivated research, touching on… Read More
Joint Automatic Speech Recognition And Structure. Learning For Better Speech UnderstandingJanuary 29, 2025
Speaker: María Pilar Fernández Gallego. Abstract: Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is… Read More
A Whisper-based Query-by-Example Spoken Term Detection approach for search on speechJanuary 22, 2025
Speaker: Javier Tejedor Noguerales. Abstract: Nowadays, in the digital era, the amount of information stored in audio repositories is undoubtedly growing. This makes necessary the development of efficient and automatic methods to search on audio content. To address it, search… Read More
NIST 2024 Speaker Recognition EvaluationJanuary 15, 2025
Speaker: Sara Barahona Quirós. Abstract: In this talk we will present our paritcipation to the NIST 2024 SRE Evaluation in collaboration with Brno University of Technology, Polito, Phonexia, Omilia and CRIM. This evaluation focuses on speaker detection over conversational telephone… Read More
Foundational Models for Self-Supervised Speaker Diarization and Target Speaker ASRDecember 18, 2024
Speaker: Alicia Lozano-Diez. Abstract: In this talk, I will review a few of the last trends in speaker diarization and target speaker ASR. I will present two papers that address these two tasks respectively, and leverage the power of foundational… Read More
What can LLMs bring to the field of acoustic event detection?December 11, 2024
Speaker: Sergio Segovia González. Abstract: The answer to this question through these two articles, “WILDDESED: An LLM-POWERED dataset for wild domestic environment sound event detection system” and “Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection” has been carried… Read More
Exploring Large Protein Language Models in Constrained Data RegimesDecember 4, 2024
Speaker: Manuel Fernando Mollon Laorca. Abstract: In this study, we expand upon the FLIP benchmark—designed for evaluating protein language models (pLMs) in small, specialized prediction tasks—by assessing the performance of state-of-the-art models, including ESM-2, SaProt, and Tranception, on the FLIP… Read More
Fusion-Based Speaker Diarization: Insights from IberSpeech2024November 27, 2024
Speaker: Juan Ignacio Álvarez Trejos. Abstract: This talk presents the results of our participation in the speaker diarization challenge at IberSpeech2024. Our approach combines the strengths of three diarization models: a custom-trained Diaper model, Pyannote, and VBx, through an innovative… Read More
Device-robust audio classificationNovember 20, 2024
Speaker: Wiliam Fernando López Gavilánez. Abstract: Audio classifiers designed for deployment across diverse devices often face unforeseen conditions during inference, attributable to device-specific characteristics. These challenges stem from variations in microphone transfer functions or on-chip digital signal pre-processing, which result… Read More
Analyzing DiaPer EEND Speaker Diarization Models on the RTVE2022 DatasetNovember 6, 2024
Speaker: Juan Ignacio Álvarez Trejos. Abstract: The task of speaker diarization has lately been successfully tackled with end-to-end neural diarization (EEND) models instead of modular cascaded ones. Among them, the very new EEND Perceiver-based attractors (DiaPer) comes with a light… Read More
Analysis of Speaker Label Matching for Diarization of Long Audios on RTVE2022 DatasetNovember 6, 2024
Speaker: Laura Herrera Alarcón. Abstract: This study introduces an algorithm to match predicted speaker labels from short audio segments into a final prediction. This involves extracting an x-vector for each speaker in each segment and applying constrained Agglomerative Clustering to… Read More
Towards Efficient Conformer-based Sound Event DetectionNovember 6, 2024
Speaker: Sara Barahona Quirós. Abstract: The Conformer architecture has shown excellent performance in accurately classifying sound events but lacks temporal precision when predicting time boundaries. While increasing the length of the input sequences can mitigate this issue, it also increases… Read More
Automatic Speech Recognition in Dialectal Data (COSER)October 30, 2024
Speaker: Clara Adsuar Ávila. Abstract: In this project, we address the importance of enhancing the accessibility and usefulness of Deep Learning technologies for non-standard speakers. From a linguistic perspective, rural Spanish areas are rich in dialectal variety. However, most technology… Read More
Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and CalibrationOctober 23, 2024
Speaker: Daniel Ramos Castro. Abstract: Most machine learning classifiers are designed to output posterior probabilities for the classes given the input sample. These probabilities may be used to make the categorical decision on the class of the sample; provided as… Read More
One model to rule them all? Towards end-to-end joint speaker diarization and speech recognitionOctober 9, 2024
Speaker: Laura Herrera Alarcón. Abstract: This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition). SLIDAR can process arbitrary length inputs and can handle any number of speakers, effectively… Read More
Emotion recognition in Spanish audioOctober 2, 2024
Speaker: Manuel Otero González. Abstract: En esta charla se explicará la tarea de reconocimiento de emociones en audios en español, presentando los enfoques más avanzados del estado del arte, como Wav2Vec2 y W2V-Bert. Además, se introducirá el reto EmoSPeech, cuyo… Read More
State of the Art in Sound Event Detection and DCASE EvaluationsSeptember 25, 2024
Speaker: Doroteo Torre Toledano. Abstract: In this talk I will review the most recent trajectory of the AUDIAS group in the field of Sound Event Detection (SED), highlighting our participations in DCASE evaluations (Task 4) from 2020 to 2023. Then,… Read More
Large Language Models: From Theory to Practice in Text ClassificationSeptember 18, 2024
Speaker: Miguel Ángel Martínez Pay. Abstract: This work presents a comprehensive overview of Large Language Models (LLMs), from their theoretical framework to practical applications in text classification. It compares the effectiveness of two key approaches: fine-tuning embeddings of smaller models… Read More
Integration of Emotional Information in Speaker Recognition SystemsSeptember 11, 2024
Speaker: Arturo Domínguez Santos. Abstract: This Master’s thesis addresses the challenge of investigating how emotions affect speakerverification and proposes a system that integrates this emotional variability to try toimprove accuracy. The focus is on the speaker’s emotions, which has traditionally… Read More
Exploring Speech Foundation Models for End-to-End Speaker DiarizationSeptember 4, 2024
Speaker: Laura Herrera Alarcón. Abstract: In this Master’s Thesis the use of pre-trained models for the diarization task has beenstudied in order to exploit their ability to extract robust and discriminative features.In particular, the WavLM model has been combined with… Read More
Interpretation of fingerprint evidence with likelihood ratios (LRs – Likelihood ratios)July 3, 2024
Speaker: Joaquín González Rodríguez. Abstract: The forensic fingerprint identification process based on the ACE-V method, widely implemented, makes absolute identification or exclusion decisions that depend on opinions that vary from expert to expert (for example, whether we consider an observed… Read More
SGE & CCC Architecture – Introduction for BeginnersJune 26, 2024
Speaker: Adrián Aranda Márquez. Abstract: Simple technical introduction for using SGE with AUDIAS servers (Son of Grid Engine) and CCC (UAM’s Central Computing Center).
Stabilising Reinforcement Learning with Past Action-State Representation LearningJune 18, 2024
Speaker: Tamas Endrei. Abstract: Although deep reinforcement learning (DRL) deals with sequential decision-making problems, temporal information representation is absent from state-of-the-art actor-critic algorithms. The reliance on only the current timestep information causes instability in concurrent actions. Furthermore, the over-reliance on… Read More
Study of the predictive capacity of the efficacy of platelet-rich plasma (PRP) treatments in joint injuriesJune 11, 2024
Speaker: Berta Caunedo Castro. Abstract: This Final Degree Project evaluates Platelet-Rich Plasma (PRP) therapy as an alternative to traditional treatments for knee osteoarthritis, a prevalent joint condition. PRP uses regenerative growth factors from the patient’s blood, but its variability complicates… Read More
Enhancing Sound Event Detection and Speaker Verification employing weak supervisionJune 5, 2024
Speaker: Sara Barahona Quirós. Abstract: In this seminar, we will explore approaches for training acoustic event detection and speaker verification systems employing limited labels. Specifically, for the first task, we will explain the optimization process of a system based on… Read More
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker ScenariosMay 29, 2024
Speaker: Juan Ignacio Álvarez Trejos. Abstract: This presentation covers the work presented at Odyssey 2024, focusing on speaker diarization in two-speaker scenarios. End-to-end neural speaker diarization systems are designed to handle overlapping speech while accurately distinguishing between speakers. In this… Read More
Transformers for Binding Prediction of Hypoxia-Induced FactorsMay 22, 2024
Speaker: Manuel Fernando Mollón Laorca. Abstract: Hypoxia-inducible factors (HIFs) are proteins that play a crucial role in the cellular response to low oxygen levels. Accurate prediction of the binding of these factors to their target DNA is essential for understanding… Read More
Whisper‑based spoken term detection systems for search on speech ALBAYZIN evaluation challengeMay 8, 2024
Speaker: Javier Tejedor Noguerales. Abstract: The vast amount of information stored in audio repositories makes necessary the development of efficient and automatic methods to search on audio content. In that direction, search on speech (SoS) has received much attention in… Read More
Road map for Albayzin Diarization Challenge 2024April 24, 2024
Speaker: Jérémie Touati. Abstract: The diarization challenge of the 2024 Albayzin evaluation stands out by various difficulties. The recordings, which come from databases of Spanish radio and television programs, can last up to several hours, they contain an undetermined and… Read More
Introduction to the Language-Based Audio Retrieval task.April 17, 2024
Speaker: Manuel Otero. Abstract: Language-Based Audio Retrieval is a task of the DCASE Challenge, which is based on the retrieval of audio information from natural language descriptions. Two of the best performing approaches in the state of the art will… Read More
Data Augmentation for Respiratory Cycle ClassificationApril 10, 2024
Speaker: Miguel Ángel. Abstract: Analysing respiratory audios in order to detect and classify adventitious respiratory sounds is of vital importance for the development of continuous monitoring tools for patients with respiratory diseases. The ICBHI 2017 database is the most widely… Read More
Diarization Introduction & EEND Perceiver-based DiarizationMarch 20, 2024
Speaker: Alicia Lozano Díez. Abstract: In this talk, I will present an introduction of the speaker diarization task as well as the latest approaches based on neural networks as self-attention end-to-end neural diarization (EEND) with encoder-decoder attractors (EDA) as opposed… Read More
Introduction to Reinforcement Learning.March 13, 2024
Speaker: Tamas Endrei. Abstract: Reinforcement learning (RL) has emerged as one of the most fascinating fields of machine learning, providing solutions to challenging problems ranging from complex robotics behaviors to optimizing neural network architectures. Despite its immense potential, RL’s complex… Read More
GPU Parallel Computing for Deep LearningMarch 6, 2024
Speaker: Beltrán Labrador Serrano. Abstract: Large Language Models (LLMs) is transforming natural language processing and are now impacting speech processing. This talk addresses the challenge of training these massive neural networks required to follow this trend. I will present GPU… Read More
Rotary Position Embeddings (RoPE) in Transformers.February 28, 2024
Speaker: Doroteo Torre Toledano. Abstract: Since Transformers were proposed in 2017, they have dominated the state-of-the-art in several domains including language modelling, speech processing, and even image processing. Although the main ideas of the original Transformers are essentially kept, there… Read More
Large Language Models in Protein EngineeringFebruary 21, 2024
Speaker: Natalia Pinto Estéban. Abstract: The intersection of artificial intelligence and protein engineering represents an innovative frontier in scientific exploration. In this presentation, titled ‘Large Language Models in Protein Engineering,’ we delve into the field of advanced language models, focusing… Read More
Lute and vihuela in the Renaissance period: instruments and musicFebruary 14, 2024
Speaker: Joaquín González Rodríguez. Abstract: In this talk we will present an overview of two extremely popular plucked musical instruments in XVI century in Europe, the Lute and its Spanish version the Vihuela. Sharing a common tuning and playing characteristics… Read More
DiarizationLM: speaker diarization post-processing with large language modelsFebruary 7, 2024
Speaker: Laura Herrera Alarcón. Abstract: This paper presents a framework designed to post-process the outputs of speaker diarization systems using large language models (LLM). The framework aims to enhance the readability of the diarized transcripts and reduce the WDER. For… Read More
Fainess in Modern ASR SystemsJanuary 22, 2024
Speaker: Pilar Fernández Gallego. Abstract: Nowadays ASR (Automatic Speech Recognition) systems have dramatically improved, due both to advances in deep learning and to the collection of large datasets used to train the systems. However, it has been demonstrated in studies… Read More
Explainable Machine LearningDecember 11, 2023
Speaker: Sara Barahona Quirós. Abstract: Explainable Machine Learning (XAI) refers to the development of machine learning models and algorithms that not only make accurate decisions but also provide understandable and interpretable explanations for those predictions. In traditional machine learning, particularly… Read More
Generative Artificial Intelligence: A Global OverviewDecember 4, 2023
Speaker: Diego de Benito Gorrón. Abstract: Generative Artificial Intelligence (GenAI) has made a strong impact on the technological landscape, redefining paradigms and possibilities. This talk offers a panoramic view of GenAI, with a specific focus on Large Language Models (LLMs)… Read More
Robust Wake-up Word by Two-stage Multi-resolution EnsemblesNovember 27, 2023
Speaker: William Fernando López Gaviánez. Abstract: Voice-based interfaces rely on a wake-up word mechanism to initiate communication with devices. However, achieving a robust, energy-efficient, and fast detection remains a challenge. This paper addresses these real production needs by enhancing data… Read More
Towards automatic inspection of nuclear fuel elements in spent fuel storage with AI tools.November 20, 2023
Speaker: Sergio Segovia González. Abstract: New way to automatize the inspection of nuclear fuel elements in spent fuel storage processing video signal and audio signal. For video signal, it is developed a custom database including images from several nuclear facilities… Read More
FLIP (Fitness Landscape Inference for Proteins)November 6, 2023
Speaker: Natalia Pinto Estéban. Abstract: Machine learning is growing in significance across various research domains. One of these domains is biology, specifically focusing on protein engineering and directed evolution techniques. This presentation is grounded in the FLIP paper (Fitness Landscape… Read More
Knowledge Distillation to Compress and Accelerate Large ModelsOctober 30, 2023
Speaker: Laura Herrera Alarcón. Abstract: These papers present the idea of Knowledge Distillation, a method to compress and accelerate large models with high computational and storage cost. Thanks to this, these models can be used for real-time applications or in… Read More
An introduction to Spiking Neural Networks (SNNs) and neuromorphic computingOctober 23, 2023
Speaker: Doroteo Torre Toledano. Abstract: This talk is an overview of Spiking Neural Networks, a biologically inspired type of neural networks that outputs digital spikes over continuous time in an asynchronous way, instead of continuous values at frame-by-frame synchronous times… Read More
A Systematic Study on the Use of the Log-Likelihood Ratio Cost in Forensic ScienceOctober 16, 2023
Speaker: Daniel Ramos Castro. Abstract: It is increasingly common in forensic science to report evidential findings in terms of a likelihood ratio (LR). Such analyses are often supported by (semi-)automated LR systems based on statistical methods, which allows for validation… Read More
Language Models in Protein EngineeringOctober 9, 2023
Speaker: Joaquín González Rodríguez. Abstract: The sequences of aminoacids describing a protein can be efficiently handled by language models. In this talk, present and future applications of Transformer-based protein Language Models are surveyed, focusing in databases, benchmarks and models already… Read More
Automatic Wheeze Segmentation Using Harmonic-Percussive Source Separation and Empirical Mode DecompositionOctober 2, 2023
Speaker: Miguel Ángel Martínez Pay. Abstract: Based on https://ieeexplore.ieee.org/document/10051156. Wheezes, a respiratory anomaly in patients with respiratory conditions, are significant for clinical assessment, particularly in gauging bronchial obstruction. While conventional auscultation is the norm for wheeze analysis, recent years emphasize… Read More
Personalized keyword spotting detection : Research internship @ GoogleSeptember 25, 2023
Speaker: Beltrán Labrador Serrano. Abstract: Keyword spotting systems are used in a variety of applications, such as smart speakers and voice assistants. However, these systems can be challenged by diverse accents, age groups, and speaking conditions.In this talk, I will… Read More
Sound Event Detection with Conformer: the AUDIAS system for DCASE 2023September 18, 2023
Speaker: Sara Barahona Quirós. Abstract: The Conformer architecture has achieved state-of-the-art results in several tasks, including automatic speech recognition and automatic speaker verification. However, its utilization in sound event detection and in particular in the DCASE Challenge Task 4 has… Read More
Deployment of KWS models: audio features optimization and streaming modeJune 16, 2023
Speaker: William Fernando López Gavilánez. Abstract: The deployment process of Keyword Spotting (KWS) models depends on the target hardware, it normally includes merging components in a black box, binarization, quantization, and/or mobile optimization. In addition, while processing a continuous stream… Read More
Lines of research in the field of acoustic events detectionJune 9, 2023
Speaker: Sergio Segovia González. Abstract: Within the development of the doctoral thesis, whose objective is to work in the field of acoustic event detection, it has been carried out the implementation of several lines of research, such as using the… Read More
Fairness in the most popular ASR systemsJune 1, 2023
Speaker: Pilar Fernández Gallego Abstract: Nowadays ASR (Automatic Speech Recognition) systems have dramatically improved, due both to advances in deep learning and to the collection of large datasets used to train the systems. However, it has been demonstrated that some… Read More
VoxCeleb-Spain: design, acquisition and evaluation with deep neural networks of a database of Spanish celebrity voicesMay 26, 2023
Speaker: Manuel Otero González. Abstract: This work presents a new database, VozCeleb-Spain, captured following analogous protocols as the well-know VoxCeleb database, but using YouTube(TM) videos of celebrities of Spanish nationality. The evaluation of the database through various experiments is also… Read More
GuitarSet: A Dataset for Guitar TranscriptionMay 19, 2023
Speaker: Diego de Benito Gorrón. Abstract: Based on https://guitarset.weebly.com/uploads/1/2/1/6/121620128/xi_ismir_2018.pdf. The guitar is a popular instrument for a variety of reasons, including its ability to produce polyphonic sound and its musical versatility. The resulting variability of sounds, however, poses significant challenges… Read More
Representing evidence for Bayesian updating: compositional evidence, privacy and calibrationMay 12, 2023
Speaker: Paul-Gauthier Noé. Abstract: Attribute privacy in multimedia technology aims to hide only one or a few personal characteristics, or attributes, of an individual rather than the full identity. To give a few examples, these attributes can be the sex,… Read More
Detection of abnormalities in electrocardiograms with 2 sensors using machine learningMay 5, 2023
Speaker: Ana Molina Conesa. Abstract: This talk is based on the Physionet Challenge 2021, in which participants aim to design and implement an algorithm capable of automatically identifying any cardiac abnormalities present in electrocardiogram (ECG) recordings with 12, 6, 4,… Read More
Anomaly detection in 12-lead electrocardiograms using machine learningApril 28, 2023
Speaker: Miguel González Rodríguez. Abstract: The Physionet Challenge 2021 is presented. The goal is to classify 27 types of cardiac anomalies from electrocardiograms using convolutional neural networks (CNN). The challenge database consists of over 30,000 patient records, making it one… Read More
Precomputed Sound Propagation for Virtual Reality & GamingApril 21, 2023
Speaker: Joaquín González Rodríguez. Abstract: This talk is based on: Parametric Wave Field Coding for Precomputed Sound Propagation (ACM Transactions on Graphics, Vol. 33, No. 4, Article 38, Publication Date: July 2014) Parametric Directional Coding for Precomputed Sound Propagation (ACM… Read More
Breath cycle detection in respiratory audiosApril 14, 2023
Speaker: Miguel Ángel Martínez Pay. Abstract: Neural networks applied to the detection of acoustic events in respiratory audios. Introduction to the ICBHI 2017 database dedicated to the classification of respiratory cycles into “normal”, “with crackles”, “with wheezes”, “with both”. Main… Read More
PhysioNet Challenge 2016: Classification of Heart Sound RecordingsMarch 24, 2023
Speaker: Javier Galán Fernández. Abstract: Cardiovascular diseases are the leading cause of death in the world, accounting for 32% of all deaths recorded throughout the year. The 2016 PhysioNet challenge aimed to encourage the development of algorithms to classify heart… Read More
How speaker diarization evolved recently: from clustering to end-to-end approachesMarch 17, 2023
Speaker: Alicia Lozano Díez. Abstract: Speaker diarization systems aim to segment a multi-speaker audio recording according to speaker changes, providing the time stamps of the activity of each speaker, including segments where nobody speaks and those where more than one… Read More
VoxCeleb-Spain: Design, Acquisition and Preliminar EvaluationMarch 10, 2023
Speaker: Manuel Otero González. Abstract: Description of VoxCeleb and its latest Challenges (2019-2022), elaboration and capture of audio database of celebrities of Spanish nationality, and preliminary evaluation of a pre-trained system with the acquired data.
MusicLM: Generating music from textMarch 3, 2023
Speaker: Laura Herrera Alarcón Abstract: Based on https://arxiv.org/pdf/2301.11325.pdf. This paper presents a new model for generating high-fidelity music from text descriptions. It combines SoundStream, w2v-BERT and MuLan, 3 models that allow to obtain temporal coherence and high quality audios of… Read More
Iterative psuedo-forced alignment toolFebruary 17, 2023
Speaker: W. Fernando López Gavilánez. Abstract: High-quality data labeling from specific domains is costly and human time-consuming. In this work, we propose an iterative pseudo-forced alignment algorithm for long audio files with low-quality transcriptions. The alignments are iteratively done by… Read More
Differentially Private Fine-Tuning for Language ModelsFebruary 10, 2023
Speaker: Beltrán Labrador Serrano. Abstract: Based on https://arxiv.org/abs/2110.06500. In this talk we will comment the paper Differentially Private Fine-Tuning for Language Models, where the authors give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models,… Read More
Conformer Architecture for Sound Event Detection (DCASE) February 3, 2023
Speaker: Sara Barahona Quirós. Abstract: Sound Event Detection is the task that is focused on automatizing the human’s ability of recognizing sound events in the environment. Over the last years, the creation of evaluations such as the Detection and Classification… Read More
MixMatch: A Holistic Approach to Semi-Supervised LearningJanuary 20, 2023
Speaker: Diego de Benito Gorrón. Abstract: This talk is an overview of a NIPS 2019 paper by David Berthelot et al. (Google Research) that proposes a novel method for Semi-supervised learning: MixMatch. “Semi-supervised learning has proven to be a powerful… Read More
Highly accurate protein structure prediction with AlphaFoldJanuary 13, 2023
Speaker: Juan Ignacio Álvarez Trejos. Abstract: Based on https://www.nature.com/articles/s41586-021-03819-2. Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been… Read More
Whisper: Robust Speech Recognition via Large-Scale Weak SupervisionDecember 2, 2022
Speaker: Doroteo Torre Toledano. Abstract: Very recently (in Sept 2022) OpenAI has made freely available a speech recognition neural network called Whisper. One of the main differences with respect to the current state of the art is the use of… Read More
Dynamic Bayesian Networks for Temporal Prediction of Chemical Radioisotope Levels in Nuclear Power Plant ReactorsNovember 18, 2022
Speaker: Daniel Ramos Castro. Abstract: Radiation dose in nuclear power plant reactors is known to be dominated by the presence of radioisotopes in the primary loop of the reactor. In order to strictly control it in normal operation (e.g., cleaning… Read More
Automatic adventitious respiratory sound analysis: A systematic reviewNovember 11, 2022
Speaker: Miguel Ángel Martínez Pay. Abstract: Based on https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177926. Automatic detection or classification of adventitious sounds is useful to assist physicians in diagnosing or monitoring diseases such as asthma, Chronic Obstructive Pulmonary Disease, and pneumonia. This article contains a compilation… Read More
Training Speaker Recognition Systems with Limited DataNovember 4, 2022
Speaker: Guillermo Recio. Abstract: Based on paper https://www.isca-speech.org/archive/pdfs/interspeech_2022/vaessen22_interspeech.pdf. This work considers training neural networks for speaker recognition with smaller datasets compared to contemporary work. For this purpose, they propose three subsets of the VoxCeleb2 dataset. Each of these subsets contains… Read More
Exploring sequence-to-sequence transformer-transducer models for keyword spottingOctober 14, 2022
Speaker: Beltrán Labrador Serrano. Abstract: Beltrán’s final Google research internship presentation. This presentation introduces a transformer-transducer keyword spotting system that simultaneously optimizes ASR and keyword spotting losses using a sequence to sequence RNN-T loss. Each loss is further balanced using… Read More
Perceiver: General Perception with Iterative AttentionOctober 7, 2022
Speaker: Juan Ignacio Álvarez Trejos. Abstract: Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for… Read More
Continual learning for recurrent neural networksSeptember 30, 2022
Speaker: Doroteo Torre Toledano Abstract: The current trend in machine learning assumes that there is a fixed distribution of incoming data, so that a fixed model can be learned to map incoming data to output classes. However, real applications in… Read More
Source Separation for Sound Event Detection in Domestic Environments Using Jointly Trained ModelsSeptember 23, 2022
Speaker: Diego de Benito Gorrón. Abstract: Sound Event Detection and Source Separation are closely related tasks: whereas the first aims to find the time boundaries of acoustic events inside a recording, the goal of the latter is to isolate each… Read More
Representaciones de audio self-supervised Wav2Vec2 para el reconocimiento de locutorSeptember 16, 2022
Speaker: Laura Herrera. Abstract: In this Final Degree Project, different speech representations, extracted by unsupervised learning, have been used to train a speaker recognition system. In particular, Wav2Vec2.0 and WavLM features have been used as a novelty. The Wav2Vec2.0 features… Read More
End-to-end deep learning models for air traffic control speech recognitionSeptember 13, 2022
Speaker: Ana Belén Fernández Cordero. Abstract: For many years, Air Traffic Controllers have had to manually type the information they received and transmitted to pilots into the electronic flight strip systems. This time consuming activity contributed to a significant increase… Read More
Efficient Transformers for End-to-End Neural Speaker DiarizationSeptember 9, 2022
Speaker: Sergio Izquierdo. Abstract: The recently proposed End-to-End Neural speaker Diarization framework (EEND) handles speech overlap and speech activity detection natively. While extensions of this work have reported remarkable results in both two-speaker and multi-speaker diarization scenarios, these come at… Read More
Sound Event Detection in a large-scale audio dataset with multi-resolution neural networksSeptember 9, 2022
Speaker: Sara Barahona Quirós. Abstract: Sound event detection is the task that aims to automatize the human’s ability of recognizing sound events in the environment by their particular acoustic information. For this purpose, deep learning techniques are employed to build… Read More
A Speaker Verification Backend with Robust Performance across ConditionsJuly 14, 2022
Speaker: Joaquin Gonzalez-Rodriguez. Abstract: Presentation of the paper in https://arxiv.org/abs/2102.01760: L. Ferrer et al. “A Speaker Verification Backend with Robust Performance across Conditions”, 2021. Abstract of the paper (reproduced from the preprint): In this paper, we address the problem of… Read More
Linear-Gaussian Bayesian Network Applications to Forensic ChemistryJune 30, 2022
Speaker: Elías Hernandis Prieto. Abstract: Forensic evidence evaluation using the likelihood ratio framework requires knowledge about the probability distribution of the data. For evaluating samples of glass remains, this translates to obtaining the joint probability distribution of the relative concentrations… Read More
Improvements in deep learning semi-supervised model selection for the optimization of different Sound Event Detection metricsJune 23, 2022
Spaker: Cristina Moratilla. Abstract: Sound Event Detection is one of the most developed fields in the area of audio signal processing in the last decades. The objective of such detection is to locate the start and end instants of audio… Read More
Bias analysis in speaker recognition systems based in DNN-embeddingsJune 20, 2022
Speaker: Almudena Aguilera. Abstract: In this study we will evaluate the discriminatory behaviours that are generated in speaker recognition systems, specifically those that verify whether two audios belong to the same speaker or not. These systems work by extracting the… Read More
MetaAudio: A Few-Shot Audio Classification BenchmarkJune 9, 2022
Speaker: David Martín Gutiérrez. Abstract: Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by… Read More
Speaker Diarization, X-vectors with Encoder-Decoder based attractorsJune 2, 2022
Speaker: Juan Ignacio Álvarez Trejos. Abstract: X-Vectors are speaker embeddings that emerge to address the speaker recognition task, surprisingly outperforming i-vectors in most speaker tasks. It is proposed to take advantage of the information contained in these embeddings by using… Read More
Gaussianization of LA-ICP-MS Features to Improve Calibration in Forensic Glass ComparisonMay 25, 2022
Speaker: Pablo Ramírez Hereza. Abstract: The forensic comparison of glass task aims to compare a glass sample of unknown source with a control glass sample of known source. In this work, we use multielemental features from laser ablation inductively coupled… Read More
Article review: “Objectifying evidence evaluation for gunshot residue comparisons using machine learning on criminal case data”May 19, 2022
Speaker: Daniel Ramos Castro Abstract: Basado en https://doi.org/10.1016/j.forsciint.2022.111293. “Comparative gunshot residue analysis addresses relevant forensic questions such as ‘did suspect X fire shot Y?’. More formally, it weighs the evidence for hypotheses of the form H1: gunshot residue particles found… Read More
Assessing Calibration in the regression settingMay 12, 2022
Speaker: Sergio Álvarez Balanya. Abstract: Calibration is a desirable property of pattern recognition systems, especially when their predictions are going to be used to make decisions. In our group, we are used to dealing with calibration in classification tasks such… Read More
Call-sign recognition and understanding for noisy air-traffic transcripts using surveillance informationMay 5, 2022
Speaker: Ana Belén Fernández Cordero. Abstract: Air traffic control (ATC) relies on communication via speech between pilot and air-traffic controller (ATCO). The call-sign, as unique identifier for each flight, is used to address a specific pilot by the ATCO. Extracting… Read More
AVASpeech-SMAD: A speech and music activity detection database with label co-occurrenceApril 28, 2022
Speaker: Guillermo Recio Martín. Abstract: AVASpeech is a publicly available dataset created in 2018 to contribute to the task of speech activity detection (SAD) task. This dataset contains three different types of audio segments: clean speech, speech co-occuring with music… Read More
Conformer-based sound event detection with semi-supervised learning and data augmentationApril 22, 2022
Speaker: Sara Barahona Quirós. Abstract: This paper presents a Conformer-based sound event detection (SED) method, which uses semi-supervised learning and data augmentation. The proposed method employs Conformer, a convolution-augmented Transformer that is able to exploit local features of audio data… Read More
Speaker Diarization with Region Proposal NetworkApril 7, 2022
Speaker: Sergio Izquierdo del Álamo. Abstact: Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the “who spoke when” problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they… Read More
Conversational Agents for Health CareMarch 31, 2022
Speaker: Giuliano Lazzara. Abstract: Brief that focuses on people’s perception of Conversational Agents and proposes these technologies as a tool to deal with underestimated mental issues such as depression and anxiety. Referring to experiments done with “Woebot”, an automated conversational… Read More
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and LanguageMarch 24, 2022
Speaker: Sergio Segovia. Abstract: The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such… Read More
Data Augmentation for Decoupled Calibration of Deep Neural Network ClassifiersMarch 17, 2022
Speaker: Sergio Márquez Carrero. Abstract: Modern Deep Neural Networks (DNN) have significantly outperformed those employed over a decade ago in terms of accuracy. Nonetheless, the outputs generated by these models are poorly calibrated, causing substantial issues in a variety of… Read More
Connectionist Temporal Classification (CTC) Speech SegmentationMarch 10, 2022
Speaker: W. Fernando López Gavilanez. Abstract: Motivated by the lack of high-quality labeled data for specific scenarios, such as emergencies in the home environment, we explored a CTC-segmentation method to generate a specific-purpose speech dataset. The project seeks the quality improvement of… Read More
BigSSL: Large-Scale Semi-Supervised Learning for ASRMarch 3, 2022
Speaker: Laura Herrera Abstract: This paper deals with results obtained on very large automatic speaker recognition models.A large amount of labelled data is not always available and sometimes they do not generalize enough. Consequently, the authors propose to use pre-trained… Read More
Efficient Neural Approaches for Automatic Speech RecognitionFebruary 24, 2022
Speaker: Doroteo Torre Toledano Abstract: Many different end-to-end neural approaches have been proposed in the last years in the field of automatic speech recognition (ASR). However, most of the research available compares systems only in terms of accuracy (word error… Read More
Structured Output LearningFebruary 10, 2022
Speaker: María Pilar Fernández Rodríguez Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when, the language, punctuation, capitalization… To deal with it, it is typically addressed by merging the outputs… Read More
Voxceleb Experiment: fairnessJanuary 27, 2022
Speaker: Almudena Aguilera Abstract: The experiment is based on the dataset from Voxceleb [1], using the two pre-trained models. The main idea of these experiments was to study the fairness problems in different demographic groups present in the data base… Read More
Semi-Supervised Music Tagging TransformerDecember 16, 2021
Speaker: David Martín Abstract: Music Tagging Transformer (MTT) was recently released in the latest ISMIR 2021 Conference as one of the most erupting deep learning approaches for Music Information Retrieval. It consists of a semi-supervised approach where the model captures… Read More
Encoder-Decoder Based Attractor Calculation for End-to-End Neural DiarizationDecember 9, 2021
Speaker: Alicia Lozano Díez Abstract: In this talk, we will deeply review the algorithms behind end-to-end systems for speaker diarization based on neural networks. In particular, we will describe how the encoder-decoder part of the model calculates “attractors” that capture… Read More
Unsupervised Sound Separation Using Mixture Invariant TrainingNovember 18, 2021
Speaker: Diego de Benito Gorrón Abstract: In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component… Read More
relMix: An open source software for DNA mixtures with related contributorsNovember 11, 2021
Speaker: Elías Hernández Abstract: La prueba de ADN ha supuesto un gran avance en el contexto judicial y muchas veces es considerada como la prueba definitiva para condenar o absolver a un acusado. Los resultados de una prueba de ADN… Read More
Improving Fairness in Speaker RecognitionNovember 4, 2021
Speaker: Almudena Aguilera Abstract: Speaker Recognition Systems aim to automatically recognize the identity of an individual from a recording of his/her speech or voice. Despite the progress of these systems in terms of accuracy, we must ask ourselves: “What happen… Read More
Speech Enhancement for Wake-up Word detection in Voice AssistantsOctober 28, 2021
Speaker: William Fernando López Abstract: Wake-up-word (WuW) detection is a fundamental component in voice assistants. Undesired activation of the device is often due to external noises such as background conversations, TV or music. In Telefónica we have been working on… Read More
Unsupervised pre-training for learning speech representations: Wav2Vec and Wav2Vec2.0October 21, 2021
Speaker: Laura Herrera Abstract: These papers (https://arxiv.org/pdf/1904.05862.pdf and https://arxiv.org/pdf/2006.11477.pdf) explore unsupervised learning from raw audio for speech recognition.A large amount of labelled data is not always available, consequently wav2vec uses a causal convolutional network trained with large amounts of unlabelled… Read More
Large-scale pre-training of End-to-End Multi-Talker ASR for meeting Transcription with Single Distant MicrophoneOctober 14, 2021
Speaker: María Pilar Fernández Gallego Abstract: Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR). While various approaches have been proposed, all previous studies… Read More
Selective Kernel NetworksSeptember 16, 2021
Speaker: Sergio Segovia Abstract: It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in… Read More
Calibration of Multiclass Probabilistic ClassifiersJune 24, 2021
Speaker: Sergio Márquez Abstract: Today’s Deep Neural Networks (DNNs) are used for numerous classification tasks, achieving high performance in terms of accuracy. In some cases, probabilistic classifiers, which assign a confidence value to each of the predictions made, are used.… Read More
Deep Learning Models with Self-Attention for the Detection of Audio EventsMay 27, 2021
Speaker: Julio González Abstract: This talk is a presentation of the BsC Thesis “Modelos de aprendizajeprofundo con auto-atención para detección de eventos de audio”. Itdescribes the implementation of the Transformer and Conformer neuralnetworks and presents the results of the test… Read More
End-to-end Speaker DiarizationMay 15, 2021
Speaker: Alicia Lozano Diez Abstract: In this talk, I will describe new approaches to the task of speaker diarization based on end-to-end neural networks, which present several advantages with respect to traditional systems based on clustering of speaker embeddings. I… Read More
Normalizing Flows for calibration of multiclass probabilistic classifiersMay 6, 2021
Speaker: Sergio Márquez Abstract: Today’s Deep Neural Networks (DNNs) have achieved high performance in accuracy, far exceeding the ones used ten years ago. Nevertheless, the outputs provided by these modern networks are less well calibrated, becoming a major problem in… Read More
Transfer Learning from computer vision to audio event detectionApril 29, 2021
Speaker: Sergio Segovia Abstract: A brief summary about my lecture, in relation to my doctorate we are exploring the idea of applying the transfer learning technique between the domain of computer vision to the objective of detecting acoustic events. The… Read More
Modeling Uncertainty with Bayesian Neural NetworksApril 22, 2021
Speaker: Sergio Álvarez Abstract: Deep Neural Networks (DNNs) have revolutionized many fields in pattern recognition like speech recognition and object detection. There are, however, some applications in which Neural Networks struggle to offer competitive performance, mainly sensitive ones. These applications… Read More
New loss function to improve calibration with mixupApril 15, 2021
Speaker: Juan Maroñas Molano Abstract: Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and… Read More
Self-supervised deep learning approaches for speaker recognitionMarch 18, 2021
Speaker: Joaquín González Abstract: In this talk I will review the thesis “Self-supervised deep learning approaches for speaker recognition” presented by Umair Khan at the UPC (Universidad Politecnica de Cataluña) in January 2021, directed by Javier Hernando. In this thesis… Read More
Data augmentation for improved robustness against packet losses in ASRMarch 11, 2021
Speaker: María Pilar Fernández Gallego Abstract: Nowadays a large amount of companies record conversations, calls, sales or even meetings, in many cases to comply with the current legislation. Apart from the legal need, these recordings constitute an invaluable source of… Read More
End-to-end Query-by-example Spoken Term DetectionMarch 11, 2021
Speaker: Juan Ignacio Álvarez Trejos Abstract: Query-by-example Spoken Term Detection (QbE-STD) is a keytechnology to harness the large amount of audiovisual content that is being stored and generated nowadays. Using audio example queries for STD has several advantages such as… Read More
AUDIAS-UAM System for the Albayzin 2020 Speech to Text ChallengeMarch 4, 2021
Speaker: Beltrán Labrador Serrano Abstract: This presentation describes the system submitted by the AUDIAS-UAM team for the Albayzin 2020 Speech to Text Challenge. Our system is an end to end Transformer-based system built using ESPnet Toolkit. The acoustic model is… Read More
Multi-resolution Sound Event DetectionMarch 4, 2021
Speaker: Diego de Benito Gorrón Abstract: The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. Over the recent years, this field is holding a rising relevance due to the introduction of datasets… Read More
BUT system for the Short-duration Speaker Verification challenge 2020February 25, 2021
Speaker: Alicia Lozano Díez Abstract: In this talk, I present the Brno University of Technology (BUT) system submitted for the text-dependent task of the Short-duration Speaker Verification challenge 2020, which was the best performing system for this task. We explored… Read More
Measuring Calibration in Deep LearningFebruary 18, 2021
Speaker: Daniel Ramos Castro Abstract: In this talk, we will present the article Nixon et al. 2020, “Measuring Calibration in Deep Learning”, published in CVPR Workshops 2020. In this paper, the current most popular measure of calibration for deep learning,… Read More