Speaker: Clara Adsuar Ávila. Abstract of the paper: We demonstrate that carefully adjusting the tokenizer of the Whisper speech recognition model significantly improves the precision of word-level timestamps when applying dynamic time warping to the decoder’s cross-attention scores. We finetune… Read More
