A common task, frequently asked here on Biostars, is to find out if a given DNA sequence contains certain motifs. This can be done with Find Individual Motif Occurrences (FIMO) from the MEME suite. In this example, we check a stretch of DNA around the first exon of the human BCL6 gene for motif occurrences against all motifs listed in the JASPAR vertebrate core collection.
Coordinates of the query sequence (hg38) chr3:187744307-187746589
## Get JASPAR motifs (vertebrate non-redundant core collection) in meme format:
wget http://jaspar.genereg.net/download/CORE/JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.zip
## Unzip:
unzip JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.zip
cd JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme
## Combine into one file:
find ./ -maxdepth 1 -name "*.meme" | xargs cat > combined.meme
## Install fimo (part of MEME):
conda install -c bioconda meme
## if fimo complains about libiconv libraries, also install that manually:
conda install -c conda-forge libiconv
## run fimo:
fimo --parse-genomic-coord combined.meme input.fa
The input.fa
here looks like:
>chr3:187744307-187746589
(sequence...)
When specifying the genomic coordinates of the sequence in the fasta header in the form chr-start:end
(1-based coordinates) and using the --parse-genomic-coord
option of fimo, the resulting GFF file will show the exact coordinates of the motif in the genome.
Check output in gff format:
head fimo_out/fimo.gff
##gff-version 3
chr3 fimo nucleotide_motif 187745593 187745603 43.9 - . Name=MA0002.2_chr3-;Alias=RUNX1;ID=MA0002.2-RUNX1-1-chr3;pvalue=4.11e-05;qvalue= 0.177;sequence=TCTTGTGGCTT;
chr3 fimo nucleotide_motif 187746233 187746243 40.4 + . Name=MA0002.2_chr3+;Alias=RUNX1;ID=MA0002.2-RUNX1-2-chr3;pvalue=9.11e-05;qvalue= 0.196;sequence=GTTTGTGGTGT;
chr3 fimo nucleotide_motif 187744975 187744985 41.1 + . Name=MA0003.3_chr3+;Alias=TFAP2A;ID=MA0003.3-TFAP2A-1-chr3;pvalue=7.81e-05;qvalue= 0.323;sequence=CCCCCCAAGCA;
chr3 fimo nucleotide_motif 187745763 187745774 41.9 + . Name=MA0018.3_chr3+;Alias=CREB1;ID=MA0018.3-CREB1-1-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=TGTGACGTCGGC;
chr3 fimo nucleotide_motif 187745763 187745774 41.9 - . Name=MA0018.3_chr3-;Alias=CREB1;ID=MA0018.3-CREB1-2-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=GCCGACGTCACA;
chr3 fimo nucleotide_motif 187746240 187746250 50.7 - . Name=MA0025.1_chr3-;Alias=NFIL3;ID=MA0025.1-NFIL3-1-chr3;pvalue=8.51e-06;qvalue= 0.0387;sequence=TTACGTAACAC;
chr3 fimo nucleotide_motif 187746378 187746388 40.5 + . Name=MA0025.1_chr3+;Alias=NFIL3;ID=MA0025.1-NFIL3-2-chr3;pvalue=8.97e-05;qvalue= 0.204;sequence=ATATGTAACAA;
chr3 fimo nucleotide_motif 187745661 187745670 40.4 - . Name=MA0028.2_chr3-;Alias=ELK1;ID=MA0028.2-ELK1-1-chr3;pvalue=9.09e-05;qvalue= 0.412;sequence=ACCGGAACCT;
chr3 fimo nucleotide_motif 187745215 187745225 47.4 + . Name=MA0032.2_chr3+;Alias=FOXC1;ID=MA0032.2-FOXC1-1-chr3;pvalue=1.81e-05;qvalue= 0.0779;sequence=TAAATAAATAT;