Speaker: Jérémie Touati. Abstract: The diarization challenge of the 2024 Albayzin evaluation stands out by various difficulties. The recordings, which come from databases of Spanish radio and television programs, can last up to several hours, they contain an undetermined and… Read More
Introduction to the Language-Based Audio Retrieval task.
Speaker: Manuel Otero. Abstract: Language-Based Audio Retrieval is a task of the DCASE Challenge, which is based on the retrieval of audio information from natural language descriptions. Two of the best performing approaches in the state of the art will… Read More
Data Augmentation for Respiratory Cycle Classification
Speaker: Miguel Ángel. Abstract: Analysing respiratory audios in order to detect and classify adventitious respiratory sounds is of vital importance for the development of continuous monitoring tools for patients with respiratory diseases. The ICBHI 2017 database is the most widely… Read More
Diarization Introduction & EEND Perceiver-based Diarization
Speaker: Alicia Lozano Díez. Abstract: In this talk, I will present an introduction of the speaker diarization task as well as the latest approaches based on neural networks as self-attention end-to-end neural diarization (EEND) with encoder-decoder attractors (EDA) as opposed… Read More
Introduction to Reinforcement Learning.
Speaker: Tamas Endrei. Abstract: Reinforcement learning (RL) has emerged as one of the most fascinating fields of machine learning, providing solutions to challenging problems ranging from complex robotics behaviors to optimizing neural network architectures. Despite its immense potential, RL’s complex… Read More
GPU Parallel Computing for Deep Learning
Speaker: Beltrán Labrador Serrano. Abstract: Large Language Models (LLMs) is transforming natural language processing and are now impacting speech processing. This talk addresses the challenge of training these massive neural networks required to follow this trend. I will present GPU… Read More
Rotary Position Embeddings (RoPE) in Transformers.
Speaker: Doroteo Torre Toledano. Abstract: Since Transformers were proposed in 2017, they have dominated the state-of-the-art in several domains including language modelling, speech processing, and even image processing. Although the main ideas of the original Transformers are essentially kept, there… Read More
Large Language Models in Protein Engineering
Speaker: Natalia Pinto Estéban. Abstract: The intersection of artificial intelligence and protein engineering represents an innovative frontier in scientific exploration. In this presentation, titled ‘Large Language Models in Protein Engineering,’ we delve into the field of advanced language models, focusing… Read More
Lute and vihuela in the Renaissance period: instruments and music
Speaker: Joaquín González Rodríguez. Abstract: In this talk we will present an overview of two extremely popular plucked musical instruments in XVI century in Europe, the Lute and its Spanish version the Vihuela. Sharing a common tuning and playing characteristics… Read More
DiarizationLM: speaker diarization post-processing with large language models
Speaker: Laura Herrera Alarcón. Abstract: This paper presents a framework designed to post-process the outputs of speaker diarization systems using large language models (LLM). The framework aims to enhance the readability of the diarized transcripts and reduce the WDER. For… Read More