DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

¹

,

²

,

Xu

³

et al. 2025

Preprint

0

Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.

“…Among these, six algorithms-DeepSEA, Basset, DanQ, ExplaiNN, SATORI, and Scover-were specifically designed for diverse predictive tasks. Three models were DNA foundation models, namely DNABERT2 30,31 , Nucleotide Transformer (NT) 32 , and HyenaDNA 33 Consistent with previous studies, we observed a decline in the performance of deep learning models in cell type-specific regions (FigS2.d) 34 , with models performing better in regions associated with active histone modifications, such as H3K4me3 and H3K27ac, compared to repressive modifications like H3K9me3 and H3K27me3 (FigS2.e).…”

Section: Benchmark Pipelinesupporting

confidence: 84%

A comprehensive benchmark and guide for sequence-function interpretable deep learning models in genomics

¹

,

²

,

Xu

³

et al. 2025

Preprint

0

Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.

“…Experimental validation on real-world swine genomic datasets (PIC-GD and HZA-PMB) demonstrates that our model substantially outperforms baselines, including GBLUP and a Transformer trained from scratch [ 9 , 23 ]. This confirms our central scientific hypothesis that pre-training on the genomic data itself enables the model to learn intrinsic genomic structures, thereby boosting performance in the downstream task of phenotype prediction by capturing non-linear genetic signals [ 15 , 17 , 18 ].…”

Section: Discussionsupporting

confidence: 73%

“…These results robustly demonstrate that (1) the Transformer architecture is inherently more capable of capturing complex genetic effects than linear models and other tested architectures [ 13 , 14 ], and (2) self-supervised pre-training is the critical step that unlocks this potential by providing a powerful initialization based on general genomic knowledge [ 15 , 16 , 18 ]. Figure 6 visually confirms the strong agreement between predicted and true phenotypic values (R 2 = 0.552) for PIC-GD T5.…”

Section: Resultsmentioning

confidence: 74%

A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes

Xiang

¹

,

Li

²

,

³

et al. 2025

Animals

0

Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.

“…We pre-trained EBERTs with k ∈ {5, 6, 7} and DBERT with k ∈ {6, 7}. Our general findings on tokenization schemes agree with other DNA sequence embedding models, DNABERT [Ji et al, 2021] and EP2vec [Zeng et al, 2018], that found slight increases in downstream performance with increasing k up to 6, with diminishing returns as k increases past 6 up to 10. Here we show results from our best performing EBERT: k=7 with stride of 7, which produces a L input of 150 tokens.…”

Section: Genomic and Epigenetic Datasupporting

confidence: 76%

Epigenomic language models powered by Cerebras

Trotter¹,

Nguyen²,

Young³

et al. 2021

Preprint

Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.