Question: How do you convert the motifs databases provided in MEME suite to FASTA format needed for FIMO
1
gravatar for drautuna
6.3 years ago by
drautuna60
United States
drautuna60 wrote:

Hello all,

My question is pretty much the title; I have DNA motifs from the human genome from the output of MEME and I want to run them through FIMO to determine what transcription factors bind inside my motif sequence. I am doing all of this on a UNIX server, so I'm working with the command line version in wanting to use the MEME output in FIMO.

The MEME suite download page is here:

http://ebi.edu.au/ftp/software/MEME/index.html

and the motif database I downloaded is at the link too, but can be directly reached at:

http://ebi.edu.au/ftp/software/MEME/Databases/motifs/motif_databases.12.1.tgz

After downloading the motif database onto my UNIX server, I unpacked it and it had a bunch of motif data in the file format -- .meme. FIMO requires the database in FASTA format and I've been trying to use the readme included to find out how to convert it, but I've had little luck and I've ended up spending alot of time on something trivial.

FIMO takes in the command line as follows:

fimo [options] <motif file> <sequence file> 

 

Any help would be appreciated! Thanks.

ADD COMMENTlink modified 6.3 years ago by komal.rathi3.7k • written 6.3 years ago by drautuna60
0
gravatar for komal.rathi
6.3 years ago by
komal.rathi3.7k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.7k wrote:

Firstly, I want to differentiate between MEME & FIMO. MEME is used to find which motifs (i.e. sequence patterns) appear frequently in your sequence file, whereas FIMO uses a known set of motifs (sequence patterns specific for particular TFs) and tries to find if those motifs appear in your sequence file.

As far as I know, based on experience, FIMO takes a motif file that is NOT in FASTA format. However, FIMO is used to find which motifs appear in your query which IS a FASTA file. You have to download matrix.dat (from BIOBASE's TRANSFAC database) or you can download it from here. You then have to run transfac2meme (which is part of the MEME suite) as follows:

    transfac2meme matrix.dat > matrix.meme

The matrix.meme file contains nucleotide probability frequency matrix for each motif like this:

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF V_MYOD_01 MyoD

letter-probability matrix: alength= 4 w= 12 nsites= 5 E= 0
  0.200000        0.400000        0.400000        0.000000      
  0.400000        0.200000        0.400000        0.000000      
  0.600000        0.000000        0.200000        0.200000      
  0.000000        1.000000        0.000000        0.000000      
  1.000000        0.000000        0.000000        0.000000      
  0.000000        0.000000        0.800000        0.200000      
  0.000000        0.200000        0.800000        0.000000      
  0.000000        0.000000        0.000000        1.000000      
  0.000000        0.000000        1.000000        0.000000      
  0.000000        0.200000        0.400000        0.400000      
  0.000000        0.400000        0.000000        0.600000      
  0.200000        0.000000        0.600000        0.200000      

MOTIF V_E47_01 E47

letter-probability matrix: alength= 4 w= 15 nsites= 11 E= 0
  0.363636        0.363636        0.272727        0.000000      
  0.181818        0.454545        0.363636        0.000000      
  0.272727        0.181818        0.363636        0.181818     

(and so on for other motifs)

Then use the matrix.meme file to check which motifs appear in your sequence file:

    fimo [options] matrix.meme query.fasta

Output of fimo when I ran matrix.meme on a fasta file containing MYOD peaks:

#pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
V_MYOD_01       chr1    6204681 6204692 -       11.2641 7.11e-05                ACTCAGGTGTCT
V_MYOD_01       chr1    6205087 6205098 -       14.0614 1.35e-05                CGTCAGGTGCTG
V_MYOD_01       chr1    6277494 6277505 +       10.6425 8.78e-05                TGACAGGTGTTG
V_MYOD_01       chr1    6810137 6810148 +       12.1965 4.79e-05                CAGCAGCTGCTG
V_MYOD_01       chr1    6810137 6810148 -       12.1965 4.79e-05                CAGCAGCTGCTG
V_MYOD_01       chr1    7196368 7196379 +       17.0917 1.15e-07                CAACAGGTGTTG
V_MYOD_01       chr1    7535485 7535496 -       12.1965 4.79e-05                GAGCAGCTGCTG
V_MYOD_01       chr1    8009701 8009712 -       12.1188 4.99e-05                AAACAGCTGTCA

 

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by komal.rathi3.7k

Thanks for the long and detailed reply, but I think you misinterpreted my question;

Usage:

fimo [options] <motifs> <database>

 

Input:

  • <motifs> is the name of a file containing a list of motifs, in MEME format.
  • <database> is the name of a file containing a collection of sequences in FASTA format. The character - can be used to indicate that the sequence data should be read from standard input. This can only be used if the motif file contains a single motif.

I have the <motifs> input in meme format already - note that this is supposed to be in html, txt, or xml format: http://meme.nbcr.net/meme/doc/meme-format.html

I do NOT have the <database> file in FASTA format. The database that I downloaded from the link on the MEME suite site:

http://ebi.edu.au/ftp/software/MEME/Databases/motifs/motif_databases.12.1.tgz

gives me databases in .meme format. The link you gave me and the directions you gave me also yield a database in .meme format, when the required is in fasta format. The command line should look like:

    fimo [options] meme.html motif_database_to_search.fasta

where meme.html can also be meme.txt or meme.xml

 

ADD REPLYlink written 6.3 years ago by drautuna60

FIMO is used when you already HAVE a sequence file (in FASTA format) and you want to search for known motifs in it. So FIMO cannot be used here in your case. The FASTA file (which is your query and which you already should have because it is after all a query set) is the <database>

And as far as the confusion goes, <motifs> takes meme output as well as matrix.meme files.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by komal.rathi3.7k

Thanks for the quick reply, although I am rather confused now -- to start from the beginning, I had several sequences which were all together in FASTA format that I ran through the MEME program to find a common motif in those sequences. Now, I want to find if any known transcription factors will bind to that motif I just found. I thought FIMO fit the job, but apparently it doesn't - what other technique should I use?

ADD REPLYlink written 6.3 years ago by drautuna60
1

So you have the sequence file in FASTA right? Lets name is query.fasta. That is your <database>. Now what you can do is, create a matrix.meme file like I suggest in my answer, and run:

fimo matrix.meme query.fasta

After this you will get an output with what known motifs bind to your sequences. Fimo will give you output like this. You will not only get which motif binds to your sequences but also get the sequence pattern of the motif. You can then cross-validate these results with MEME output by checking if any of the patterns obtained in MEME are found in FIMO output too.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by komal.rathi3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 907 users visited in the last hour