Speaker: W. Fernando López Gavilanez.

Abstract: Motivated by the lack of high-quality labeled data for specific scenarios, such as emergencies in the home environment, we explored a CTC-segmentation method to generate a specific-purpose speech dataset. The project seeks the quality improvement of utterance-level labeled data and then the extraction of utterance segments from speech corpora. The CTC-segmentation algorithm depends on a CTC-based end-to-end ASR trained with CommonVoice 7.0 in Spanish. We successfully extracted emergency words from ComonVoice and trained a first versión of Spanish emergency keyword spotter.