MEME cannot handle 318 sequence?
1
0
Entering edit mode
3.3 years ago
Kai_Qi ▴ 130

Hi:

I put a fasta file for MEME like this:

meme input.fasta -mod anr -oc MEME_no_control_motif10 -rna -nmotifs 10 -minw 6 -maxw 10  > log4 2> error4
cat error4
Dataset too large (> 100000).  Rerun with larger -maxsize.

The fasta contains about 318 sequences:

wc -l input.fasta 
636 input.fasta

(636=318*2 because the fasta file also contains coordinates)

Is this normal for MEME, the manual said better no more than 500 primary sequences. What should I to make it run?

Thanks,

RNA-Seq ChIP-Seq rna-seq genome sequencing • 917 views
ADD COMMENT
0
Entering edit mode

Have you tried what the program is telling you to do?

Rerun with larger -maxsize.

ADD REPLY
0
Entering edit mode

Thanks for reply. I tried it by using maxsize 0. still get the same results, I just tried to use the first 100 line to run the code, it worked well. I guess my sequence plus the number of sequences exceed the limit of meme; then i tried too add a number of -maxsize 200000, it starts running.

ADD REPLY
2
Entering edit mode
3.3 years ago

Get the total, summed length of the 318 sequences in your FASTA file, e.g. via https://www.danielecook.com/generate-fasta-sequence-lengths/

Set your -maxsize parameter to this value.

More explicitly, the output of the aforementioned one-liner can be quickly piped to a second awk statement to sum all the lengths:

$ awk '$0 ~ ">" {if (NR > 1) {print c;} c=0;printf substr($0,2,100) "\t"; } $0 !~ ">" {c+=length($0);} END { print c; }' input.fasta | awk '{ s += $2 } END { print s }'

Using -maxsize 0 works, but it will make meme run more slowly. Speed becomes a real issue, depending on values of minw and maxw. Setting -maxsize correctly will allocate only as much memory as needed.

Via documentation: http://meme-suite.org/doc/meme.html

ADD COMMENT

Login before adding your answer.

Traffic: 2707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6