How To Run The Meme Motif Discovery Software On A Large Dataset?
1
1
Entering edit mode
9.8 years ago
k.nirmalraman ★ 1.1k

I am currently using MEME for Motif Discovery and I would like to check about 50 to 100 bases upstream for binding factors (say they are represented around -35 and -10 usually). I have a local installation of MEME.

I have about 30K upstream sequences and I am not able to run the algorithm even with -maxsize set to any high values and I get

Error: Dataset too large (-1) Rerun with larger -maxsize

How can I address this problem?

Also, as an extension to this question:

I am expecting to find more than one motif conserved (say at both -35 & -10) in different subsets of the 30K Sequence. How can I make such specifications (of location range of motif) while running MEME? or is there a variant of MEME that does this particularly?

As much as I understood PSP file, I am not able to understand what exactly does bgfile do in MEME motif discovery?

Thanks!

meme motif • 6.2k views
0
Entering edit mode

Can you post the options you are using to run MEME?

3
Entering edit mode
9.8 years ago

The simplest solution may be to run DREME instead. It's available from the same website (because it was developed by the same group) and made with larger (ChIP-seq scale) data sets in mind. There are also many other tools that you can try for your data size. MEME simply doesn't scale that well.

1
Entering edit mode

Thank you for the suggestion. I shall try DREME!

2
Entering edit mode

sure!

meme ~f1.fasta -dna -mod oops -w 8 -minw 6 -maxw 8 -nmotifs 5 -psp dna4_8.psp -revcomp -maxsize 1000000000000 -o ~/MEME/

0
Entering edit mode

Hi @Michael-Huss (@Michael Huss) cheers +1, do you know if DREME allows parallel processing?