Speaker: Doroteo Torre Toledano
Abstract: Many different end-to-end neural approaches have been proposed in the last years in the field of automatic speech recognition (ASR). However, most of the research available compares systems only in terms of accuracy (word error rate or character error rate) without comparing accuracy against required computational resources. The talk will start presenting a paper that performs such interesting comparison for 8 different and well known systems, also including Hybrid HMM-DNN systems. Given that none of those approaches included Transformer-based systems, the talk continues with a review of different proposals to make transformers computationally efficient, including the recently proposed Linear Transformers, which manage to reduce the computational complexity of transformers from being quadratic to linear with respect to the input length.