Speaker: Joaquín González
Abstract: In this talk I will review the thesis “Self-supervised deep learning approaches for speaker recognition” presented by Umair Khan at the UPC (Universidad Politecnica de Cataluña) in January 2021, directed by Javier Hernando. In this thesis the author seeks to avoid or minimize the use of labels in order to carry out unsupervised learning based on DNNs, obtaining competitive results in the state of the art in tasks known as VoxCeleb. Thus, after reusing the results obtained in a previous thesis of the same group with global speaker-adapted RBM for speaker for diarization, the author presents two fundamental lines of work, such as, firstly, the use of autoencoders to improve the robustness to multisession variability (speaker and channel) of i-vectors through different strategies, and secondly the use of Siamese networks with VGG-CNN networks, either with two branches to build an end-to-end system or with three branches using a triple loss to obtain embeddings that are then tested using cosine scoring. In all cases, the author reports better results than with his i-vectors, and in the case of Siamese networks, he presented the VoxSRC-2020 Track-3 evaluation with those systems, obtaining the third place in the competition.