Question: Proving distribution differences in large datasets
4.1 years ago
United Kingdom
surka0 wrote:


I have a question about analysis of motif occurrence in a dataset. I have a big set of sequences which were pulled out of a subset of data as fulfilling criteria x. I determined the most prevalent motifs in these sequences but would now like to show that the subset has an over abundance of these compared to both control samples and the big list (from which the subset was taken). I can tell by eye but I would like to show that the difference is significant. 


What is the simplest and most convincing way to go about this?


I'm currently making a large contingency table but I feel a chi squared test would be too complicated given that the sequence(R)xsample(C) would be >5000 x >50. Is there something simple I am missing?


Thanking you!


