Question: motif, e-value and number of sequences
0
gravatar for nicolas.descostes
2.9 years ago by
United States
nicolas.descostes130 wrote:

Hello,

I have performed a motif enrichment analysis with MEME-ChIP.

I am getting a beautiful motif with an e-value of 1.3e-357. And luckily I get a known motif (show more section below the logo) with an e-value of 1.1e-264.

When I click on "MEME" under the "Discovery/​Enrichment Program" section, I can see that 306 sites contributed to build the motif.

Knowing that I submitted 2359 sequences, I would tend to say that this motif is not significant.

Do you agree on that?

edit: The known motif which was found with centrimo, indicates 1658 in "region matches" section. does it mean that 1658 of my sequences have the motif?

Thanks a lot

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by nicolas.descostes130
5
gravatar for nicolas.descostes
2.9 years ago by
United States
nicolas.descostes130 wrote:

Reply from MEME Team:

No! MEME uses a greedy algorithm. As soon as it has enough evidence to estimate the statistical significance of a motif candidate it will stop looking for further evidence for that motif, and start looking for other motifs. The statistical significance of a motif discovered by MEME is given by the E-value. An E-value of 1.3e-357 indicates that motif is highly statistically significant in your sequence data.

Keep in mind that MEME is performing de novo motif discovery without any reference to databases of known motifs, while CENTRIMO is looking for enrichment of motifs both from your MEME results and from databases of known motifs. You'll want to asses whether the highly significant motif reported by MEME is really some variant of the known motif reported by CENTRIMO. You can compare the logograms by eye, and then look at the TomTom results to see if they really represent the same motif.

The known motif which was found with centrimo, indicates 1658 in "region matches" section. does it mean that 1658 of my sequences have the motif?

Clicking on the help button (the red question mark) for the "Region Matches" column of the CENTRIMO results provides the following text.

The number of (positive) sequences whose best match to the motif falls in the reported region.

Note: This number may be less than the number of (positive) sequences that have a best match in the region. The reason for this is that a sequence may have many matches that score equally best. If n matches have the best score in a sequence, 1/n is added to the appropriate bin for each match.

Furthermore, CENTRIMO is only considering motif matches that pass a score threshold. So, strictly speaking, CENTRIMO reporting 1658 regions matches means that 1658 of your sequences contain at least one match to the motif in their central region that passes the score threshold. Other sequences might also contain instances of the motif outside the central region, or with poorer matches to the motif position weight matrix than the score threshold will allow.

Look at the FIMO output included in the MEME-ChIP results to get a list of all motif matches in your sequence data.

ADD COMMENTlink written 2.9 years ago by nicolas.descostes130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 772 users visited in the last hour