Hello everyone,
Recently I'm searching for a sequence pattern from some fasta sequences using MEME, I have 821897
sequences in total fed into MEME for de novo motif searching using meme default parameters meme -nmotifs 3 file.fa -searchsize 1520000 -oc file_meme -seed 0620 -dna -revcomp
and found a significantly strong motif like this (here I say the motif is strong because of 821411/821897, this may be argued):
Use all 821897 sequences:
I naturally think that, given so strong motif, the motif will remain largely similar when I randomly choose some sequences, however, things became weird when I sampled 3 times of 500000 sequeces like below:
Use sampled 500000 sequences for three times:
It seems these three motifs are all strong still, but vary a lot. I am not sure what I did wrong, and your advice would be much appreciated.
P.S. I add the result of the motif generated by Weblogo3
for comparison.
Use all 821897 sequences:
Use sampled 500000 sequences for three times:
The sampled 500000 sequence for MEME and Weblogo is exactly same, My questions are:
1. Why the motif generated by MEME using almost all sequences is different to Weblogo's, which also used all sequences. I know that MEME will use some algorithm to refine motif, and weblogo simply stack all base nucleotides, but will this differ so much?
2. Why three sampled results of Weblogo are similar, but differ a lot in MEME's?
Thank you for your time!