Speaker: Beltrán Labrador Serrano.
Abstract: Keyword spotting systems are used in a variety of applications, such as smart speakers and voice assistants. However, these systems can be challenged by diverse accents, age groups, and speaking conditions.
In this talk, I will present my work on personalizing keyword spotting systems to individual speakers, which I conducted during my research internship at Google NY this summer.
Our method uses FiLM modulation to incorporate personalized enrolled speaker information into the keyword detection process. This results in significant improvements in accuracy, regardless of the speaker’s accent or age, improving our model’s fairness. We will also discuss how to implement our method in a production environment, where it is important to achieve satisfactory quality in both enrollment and enrollment-less scenarios, while fitting a limited latency and computational budget.