Speaker: David Martín

Abstract: Music Tagging Transformer (MTT) was recently released in the latest ISMIR 2021 Conference as one of the most erupting deep learning approaches for Music Information Retrieval. It consists of a semi-supervised approach where the model captures a set of acoustic characteristics from a shallow convolutional network and, using a collection of stacked self-attention layers, performs a temporal summary for every sequence. Moreover, the MTT model was improved by including a noisy student training which uses both labeled and unlabeled data along with data augmentation. Authors employed the whole audio set of the well-known a Million Song Dataset for such experiments after doing an exhaustive data cleaning to update the dataset.