Question: Proving distribution differences in large datasets
gravatar for surka
4.9 years ago by
United Kingdom
surka0 wrote:


I have a question about analysis of motif occurrence in a dataset. I have a big set of sequences which were pulled out of a subset of data as fulfilling criteria x. I determined the most prevalent motifs in these sequences but would now like to show that the subset has an over abundance of these compared to both control samples and the big list (from which the subset was taken). I can tell by eye but I would like to show that the difference is significant. 


What is the simplest and most convincing way to go about this?


I'm currently making a large contingency table but I feel a chi squared test would be too complicated given that the sequence(R)xsample(C) would be >5000 x >50. Is there something simple I am missing?


Thanking you!


sequence • 1.0k views
ADD COMMENTlink written 4.9 years ago by surka0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour