Question

How do you convert the motifs databases provided in MEME suite to FASTA format needed for FIMO

1

Entering edit mode

9.7 years ago

drautuna ▴ 60

Hello all,

My question is pretty much the title; I have DNA motifs from the human genome from the output of MEME and I want to run them through FIMO to determine what transcription factors bind inside my motif sequence. I am doing all of this on a UNIX server, so I'm working with the command line version in wanting to use the MEME output in FIMO.

The MEME suite download page is here and the motif database I downloaded is at the link too, but can be directly downloaded from this link (.tgz file).

After downloading the motif database onto my UNIX server, I unpacked it and it had a bunch of motif data in the file format -- .meme. FIMO requires the database in FASTA format and I've been trying to use the readme included to find out how to convert it, but I've had little luck and I've ended up spending alot of time on something trivial.

FIMO takes in the command line as follows:

fimo [options] <motif file> <sequence file>

Any help would be appreciated! Thanks.

FIMO MEME • 8.4k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by drautuna ▴ 60

Ram · Answer 1 · 2014-08-23

Firstly, I want to differentiate between MEME & FIMO. MEME is used to find which motifs (i.e. sequence patterns) appear frequently in your sequence file, whereas FIMO uses a known set of motifs (sequence patterns specific for particular TFs) and tries to find if those motifs appear in your sequence file.

As far as I know, based on experience, FIMO takes a motif file that is NOT in FASTA format. However, FIMO is used to find which motifs appear in your query which IS a FASTA file. You have to download matrix.dat (from BIOBASE's TRANSFAC database) or you can download it from here. You then have to run transfac2meme (which is part of the MEME suite) as follows:

transfac2meme matrix.dat > matrix.meme

The matrix.meme file contains nucleotide probability frequency matrix for each motif like this:

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF V_MYOD_01 MyoD

letter-probability matrix: alength= 4 w= 12 nsites= 5 E= 0
  0.200000        0.400000        0.400000        0.000000      
  0.400000        0.200000        0.400000        0.000000      
  0.600000        0.000000        0.200000        0.200000      
  0.000000        1.000000        0.000000        0.000000      
  1.000000        0.000000        0.000000        0.000000      
  0.000000        0.000000        0.800000        0.200000      
  0.000000        0.200000        0.800000        0.000000      
  0.000000        0.000000        0.000000        1.000000      
  0.000000        0.000000        1.000000        0.000000      
  0.000000        0.200000        0.400000        0.400000      
  0.000000        0.400000        0.000000        0.600000      
  0.200000        0.000000        0.600000        0.200000      

MOTIF V_E47_01 E47

letter-probability matrix: alength= 4 w= 15 nsites= 11 E= 0
  0.363636        0.363636        0.272727        0.000000      
  0.181818        0.454545        0.363636        0.000000      
  0.272727        0.181818        0.363636        0.181818

(and so on for other motifs)

Then use the matrix.meme file to check which motifs appear in your sequence file:

fimo [options] matrix.meme query.fasta

Output of fimo when I ran matrix.meme on a fasta file containing MYOD peaks:

#pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
V_MYOD_01       chr1    6204681 6204692 -       11.2641 7.11e-05                ACTCAGGTGTCT
V_MYOD_01       chr1    6205087 6205098 -       14.0614 1.35e-05                CGTCAGGTGCTG
V_MYOD_01       chr1    6277494 6277505 +       10.6425 8.78e-05                TGACAGGTGTTG
V_MYOD_01       chr1    6810137 6810148 +       12.1965 4.79e-05                CAGCAGCTGCTG
V_MYOD_01       chr1    6810137 6810148 -       12.1965 4.79e-05                CAGCAGCTGCTG
V_MYOD_01       chr1    7196368 7196379 +       17.0917 1.15e-07                CAACAGGTGTTG
V_MYOD_01       chr1    7535485 7535496 -       12.1965 4.79e-05                GAGCAGCTGCTG
V_MYOD_01       chr1    8009701 8009712 -       12.1188 4.99e-05                AAACAGCTGTCA