Past, present and ¿future? of Scaling Laws for Neural Language Models

Speaker: Beltrán Labrador

Abstract:

This presentation examines the scaling laws for neural networks that were foundational to the development of modern, large-scale language models. It revisits the 2020 OpenAI paper that established a key principle: model performance scales predictably with increases in compute, dataset size, and the number of parameters. This insight guided the creation of models like GPT-3. However, this scaling paradigm is approaching its limits, primarily because the finite corpus of high-quality text data from the internet has been largely exhausted. Consequently, future progress will likely depend less on pure scale and more on innovative strategies, such as leveraging multimodal and synthetic data, improving model reasoning, or adopting novel RL based post-training techniques.