Speaker: Guillermo Recio Martín.

Abstract: AVASpeech is a publicly available dataset created in 2018 to contribute to the task of speech activity detection (SAD) task. This dataset contains three different types of audio segments: clean speech, speech co-occuring with music and speech co-occuring with noise, but noise and music are not labelled. In 2021 the AVASpeech-SMAD database was created to assist speech and music activity detection research. The main particularity of this dataset is that it is composed of audio segments containing (1) only speech, (2) only music and (3) co-occurring music and speech and music is labelled as music. The new music segments have been manually labelled. As this is a publicly accessible dataset, it is a good idea to use it to train algorithms to carry out the task of speech and music activity detection.