Speaker: Paloma Villanueva Fuster.

Abstract: This study focuses on predicting hypoxia-inducible factor (HIF) binding sites, which are critical regulators of the cellular response to low oxygen levels and are implicated in various diseases, including cancer. Using DNABERT-2, a Transformer-based language model adapted for DNA sequence analysis, the model was fine-tuned through different hyperparameter configurations to enhance predictive performance. A major challenge was the class imbalance in the datasets, with positive sequences (actual binding sites) being underrepresented; this was addressed through undersampling and class-weighted loss strategies. Conducted on the MCF-7 cell line, widely used in breast cancer research, the study resulted in a model capable of accurately identifying HIF binding sites in DNA sequences, showing strong potential for future genomic analysis applications.