close
2021
DOI: 10.1093/bioinformatics/btab083
|Get access via publisher |Summarize |Cite
|
Sign up to set email alerts

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Abstract: Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder represen-tation, named DNABER… Show more

View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
900
267
108
81

Citation Types

4
1,080
0
0

Year Published

2015
2015
2026
2026

Publication Types

Select...
619
351
187
52

Relationship

2
1,207

Authors

Journals

citations

Cited by 1,079 publications

(1,110 citation statements)
references

References 59 publications

4
1,080
0
0
Order By: Relevance
“…Among these, six algorithms-DeepSEA, Basset, DanQ, ExplaiNN, SATORI, and Scover-were specifically designed for diverse predictive tasks. Three models were DNA foundation models, namely DNABERT2 30,31 , Nucleotide Transformer (NT) 32 , and HyenaDNA 33 Consistent with previous studies, we observed a decline in the performance of deep learning models in cell type-specific regions (FigS2.d) 34 , with models performing better in regions associated with active histone modifications, such as H3K4me3 and H3K27ac, compared to repressive modifications like H3K9me3 and H3K27me3 (FigS2.e).…”
Section: Benchmark Pipelinesupporting
confidence: 84%
Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.
“…Among these, six algorithms-DeepSEA, Basset, DanQ, ExplaiNN, SATORI, and Scover-were specifically designed for diverse predictive tasks. Three models were DNA foundation models, namely DNABERT2 30,31 , Nucleotide Transformer (NT) 32 , and HyenaDNA 33 Consistent with previous studies, we observed a decline in the performance of deep learning models in cell type-specific regions (FigS2.d) 34 , with models performing better in regions associated with active histone modifications, such as H3K4me3 and H3K27ac, compared to repressive modifications like H3K9me3 and H3K27me3 (FigS2.e).…”
Section: Benchmark Pipelinesupporting
confidence: 84%
Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.
“…Experimental validation on real-world swine genomic datasets (PIC-GD and HZA-PMB) demonstrates that our model substantially outperforms baselines, including GBLUP and a Transformer trained from scratch [ 9 , 23 ]. This confirms our central scientific hypothesis that pre-training on the genomic data itself enables the model to learn intrinsic genomic structures, thereby boosting performance in the downstream task of phenotype prediction by capturing non-linear genetic signals [ 15 , 17 , 18 ].…”
Section: Discussionsupporting
confidence: 73%
“…These results robustly demonstrate that (1) the Transformer architecture is inherently more capable of capturing complex genetic effects than linear models and other tested architectures [ 13 , 14 ], and (2) self-supervised pre-training is the critical step that unlocks this potential by providing a powerful initialization based on general genomic knowledge [ 15 , 16 , 18 ]. Figure 6 visually confirms the strong agreement between predicted and true phenotypic values (R 2 = 0.552) for PIC-GD T5.…”
Section: Resultsmentioning
confidence: 74%
Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.
“…We pre-trained EBERTs with k ∈ {5, 6, 7} and DBERT with k ∈ {6, 7}. Our general findings on tokenization schemes agree with other DNA sequence embedding models, DNABERT [Ji et al, 2021] and EP2vec [Zeng et al, 2018], that found slight increases in downstream performance with increasing k up to 6, with diminishing returns as k increases past 6 up to 10. Here we show results from our best performing EBERT: k=7 with stride of 7, which produces a L input of 150 tokens.…”
Section: Genomic and Epigenetic Datasupporting
confidence: 76%
Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.