Fitting Protein Language Models (PLMs) for the prediction of protein functionality using zero-shot and few-shot techniques.

Speaker: Juan Antonio Gordillo Gayo.

Abstract: The unprecedent success of deep learning has driven unprecedented progress across many scientific domains, solving tasks long considered intractable with traditional methods. A remarkable example is AlphaFold, which made it possible to predict protein structures with a considerable accuracy without the need for time-consuming and costly experiments. In this lecture, we will review Protein Language Models (PLMs) and how they enable advances in protein modeling. We will focus in particular on protein fitness prediction, and revisit the key concepts about proteins and language models needed to understand how these models work under the hood. We will also discuss the different protein representations (e.g., sequence-only, MSA-based, and structure-informed), and compare state-of-the-art models on the ProteinGym benchmark in tasks pressent on the FLIP benchmark. The comparison will cover zero-shot and few-shot settings, following the evaluation pipeline described in the paper “Exploring Large Protein Language Models in Constrained Evaluation Scenarios within the FLIP Benchmark.”