Speaker: Nicolás Martín Ansorregui.
Abstract: Passive Acoustic Monitoring (PAM) in tropical ecosystems faces significant challenges due to overlapping vocalizations and severe class imbalance, often leading to the ‘algorithmic invisibility’ of rare species. This talk presents a deep learning architecture that addresses these issues through the early fusion of latent representations from general-purpose foundation models, specifically PaSST and BEATs. By leveraging universal spectro-temporal features and a multi-stage normalization pipeline, our system optimizes multi-label classification on the AnuraSet benchmark. The proposed model achieves a state-of-the-art Macro F1-score of 93.22%, demonstrating a +7.7% improvement in the detection of rare species compared to specialized bioacoustic baselines. These results suggest that the synergy of massive generalist models offers a promising direction for developing more robust and equitable tools for biodiversity conservation.
