Speaker: Manuel Fernando Mollon Laorca.

Abstract:

In this study, we expand upon the FLIP benchmark—designed for evaluating protein language models (pLMs) in small, specialized prediction tasks—by assessing the performance of state-of-the-art models, including ESM-2, SaProt, and Tranception, on the FLIP dataset. Unlike larger, more diverse benchmarks such as ProteinGym, which cover a broad spectrum of tasks, FLIP focuses on constrained settings where data availability is limited. This makes it an ideal framework to evaluate model performance in scenarios with scarce task-specific data. We investigate whether recent advances in protein language models lead to significant improvements in such settings. Additionally, we compare zero-shot methods (where models are applied without task-specific fine-tuning) with traditional fine-tuning approaches to better understand how these models adapt to small, specific tasks. Our findings provide valuable insights into the performance of large-scale models in specialized protein prediction tasks and demonstrate the potential of zero-shot predictions in protein engineering applications by confirming their utility for data-limited environments.