Speaker: Laura Herrera
Abstract: These papers (https://arxiv.org/pdf/1904.05862.pdf and https://arxiv.org/pdf/2006.11477.pdf) explore unsupervised learning from raw audio for speech recognition.
A large amount of labelled data is not always available, consequently wav2vec uses a causal convolutional network trained with large amounts of unlabelled data, obtaining representations that serve to improve the training of the acoustic model.
Wav2vec 2.0 is an improvement using semi-supervised learning, as they use a non-causal convolutional network and a Transformer network to add the context. In both cases, the results obtained exceed the state of the art with limited amounts of labelled data.