Speaker: Doroteo Torre Toledano. Abstract: Very recently (in Sept 2022) OpenAI has made freely available a speech recognition neural network called Whisper. One of the main differences with respect to the current state of the art is the use of… Read More
Dynamic Bayesian Networks for Temporal Prediction of Chemical Radioisotope Levels in Nuclear Power Plant Reactors
Speaker: Daniel Ramos Castro. Abstract: Radiation dose in nuclear power plant reactors is known to be dominated by the presence of radioisotopes in the primary loop of the reactor. In order to strictly control it in normal operation (e.g., cleaning… Read More
Automatic adventitious respiratory sound analysis: A systematic review
Speaker: Miguel Ángel Martínez Pay. Abstract: Based on https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177926. Automatic detection or classification of adventitious sounds is useful to assist physicians in diagnosing or monitoring diseases such as asthma, Chronic Obstructive Pulmonary Disease, and pneumonia. This article contains a compilation… Read More
Training Speaker Recognition Systems with Limited Data
Speaker: Guillermo Recio. Abstract: Based on paper https://www.isca-speech.org/archive/pdfs/interspeech_2022/vaessen22_interspeech.pdf. This work considers training neural networks for speaker recognition with smaller datasets compared to contemporary work. For this purpose, they propose three subsets of the VoxCeleb2 dataset. Each of these subsets contains… Read More
Exploring sequence-to-sequence transformer-transducer models for keyword spotting
Speaker: Beltrán Labrador Serrano. Abstract: Beltrán’s final Google research internship presentation. This presentation introduces a transformer-transducer keyword spotting system that simultaneously optimizes ASR and keyword spotting losses using a sequence to sequence RNN-T loss. Each loss is further balanced using… Read More
Perceiver: General Perception with Iterative Attention
Speaker: Juan Ignacio Álvarez Trejos. Abstract: Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for… Read More
Continual learning for recurrent neural networks
Speaker: Doroteo Torre Toledano Abstract: The current trend in machine learning assumes that there is a fixed distribution of incoming data, so that a fixed model can be learned to map incoming data to output classes. However, real applications in… Read More
Source Separation for Sound Event Detection in Domestic Environments Using Jointly Trained Models
Speaker: Diego de Benito Gorrón. Abstract: Sound Event Detection and Source Separation are closely related tasks: whereas the first aims to find the time boundaries of acoustic events inside a recording, the goal of the latter is to isolate each… Read More
Representaciones de audio self-supervised Wav2Vec2 para el reconocimiento de locutor
Speaker: Laura Herrera. Abstract: In this Final Degree Project, different speech representations, extracted by unsupervised learning, have been used to train a speaker recognition system. In particular, Wav2Vec2.0 and WavLM features have been used as a novelty. The Wav2Vec2.0 features… Read More
End-to-end deep learning models for air traffic control speech recognition
Speaker: Ana Belén Fernández Cordero. Abstract: For many years, Air Traffic Controllers have had to manually type the information they received and transmitted to pilots into the electronic flight strip systems. This time consuming activity contributed to a significant increase… Read More