Speaker: Javier Tejedor Noguerales.
Abstract:
Nowadays, in the digital era, the amount of information stored in audio repositories is undoubtedly growing. This makes necessary the development of efficient and automatic methods to search on audio content. To address it, search on speech (SoS), specifically in the form of query-by-example spoken term detection (QbE STD) has received much attention in the last decades. This paper presents a QbE STD system based on the Whisper automatic speech recognizer, aiming to retrieve speech data in which a certain acoustic query appears. Experiments were run on two different Spanish speech databases, which were employed in previous Spanish SoS evaluations, and convey different domains and acoustic conditions. The results obtained have shown to outperform the best published results on the same speech databases. Additionally, several analyses based on some query properties (i.e., in-language and foreign queries, and single-word and multi-word queries) were also carried out to show the QbE STD system capability for retrieving queries that convey specific properties.