Speaker: Pilar Fernández Gallego.

Abstract: Nowadays ASR (Automatic Speech Recognition) systems have dramatically improved, due both to advances in deep learning and to the collection of large datasets used to train the systems. However, it has been demonstrated in studies that some of the most famous developed by Google, Amazon, Microsoft,… do not work equally well for all subgroups of the population, showing large differences in terms of accuracy depending on the age, gender, race, accents, and even the socio-economic status of the speakers…In short, there are many factors causing a bias and weaknesses in ASR systems.
On other hand, the last year Whisper was launched by OpenAI with great results without the need to do fine tuning to use it in a specific field. Then we wanted to analyze this ASR behaviour and bias in some sub-groups.
This study shows the results from multiple ASR models (Whisper and Wav2Vec) performance on two datasets with native and non-native american english and african american. Analyzing different aspect like age-groups, gender, accents…