Speaker: Manuel Fernando Mollón Laorca.
Abstract: In this study, we use transformer-based DNA embeddings from a pre-trained DNABERT-2 model to train a classifier that distinguishes soil evidence containing blood, urine, feces, cadaver material, or no biological material (control). We systematically evaluate multiple preprocessing pipelines, dataset splits, and model configurations to identify conditions that maximize performance. In addition to soil-treatment prediction, we also train models for temporal inference, obtaining strong results in both tasks. Our findings highlight the effectiveness of transformer embeddings for forensic soil DNA analysis and their potential to enhance both crime-scene investigation and environmental monitoring.
