Diego de Benito Gorrón has been doing a research stay at the prestigious Speech@FIT group of Brno University of Technology (BUT) in the Czech Republic from September 2021 to December 2021. He has been doing research in acoustic source separation… Read More
Semi-Supervised Music Tagging Transformer
Speaker: David Martín Abstract: Music Tagging Transformer (MTT) was recently released in the latest ISMIR 2021 Conference as one of the most erupting deep learning approaches for Music Information Retrieval. It consists of a semi-supervised approach where the model captures… Read More
Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization
Speaker: Alicia Lozano Díez Abstract: In this talk, we will deeply review the algorithms behind end-to-end systems for speaker diarization based on neural networks. In particular, we will describe how the encoder-decoder part of the model calculates “attractors” that capture… Read More
Unsupervised Sound Separation Using Mixture Invariant Training
Speaker: Diego de Benito Gorrón Abstract: In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component… Read More
relMix: An open source software for DNA mixtures with related contributors
Speaker: Elías Hernández Abstract: La prueba de ADN ha supuesto un gran avance en el contexto judicial y muchas veces es considerada como la prueba definitiva para condenar o absolver a un acusado. Los resultados de una prueba de ADN… Read More
Improving Fairness in Speaker Recognition
Speaker: Almudena Aguilera Abstract: Speaker Recognition Systems aim to automatically recognize the identity of an individual from a recording of his/her speech or voice. Despite the progress of these systems in terms of accuracy, we must ask ourselves: “What happen… Read More
Speech Enhancement for Wake-up Word detection in Voice Assistants
Speaker: William Fernando López Abstract: Wake-up-word (WuW) detection is a fundamental component in voice assistants. Undesired activation of the device is often due to external noises such as background conversations, TV or music. In Telefónica we have been working on… Read More
Unsupervised pre-training for learning speech representations: Wav2Vec and Wav2Vec2.0
Speaker: Laura Herrera Abstract: These papers (https://arxiv.org/pdf/1904.05862.pdf and https://arxiv.org/pdf/2006.11477.pdf) explore unsupervised learning from raw audio for speech recognition.A large amount of labelled data is not always available, consequently wav2vec uses a causal convolutional network trained with large amounts of unlabelled… Read More
Large-scale pre-training of End-to-End Multi-Talker ASR for meeting Transcription with Single Distant Microphone
Speaker: María Pilar Fernández Gallego Abstract: Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR). While various approaches have been proposed, all previous studies… Read More
Selective Kernel Networks
Speaker: Sergio Segovia Abstract: It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in… Read More