Proving distribution differences in large datasets
0
0
Entering edit mode
9.1 years ago
surka • 0

Hello,

I have a question about analysis of motif occurrence in a dataset. I have a big set of sequences which were pulled out of a subset of data as fulfilling criteria x. I determined the most prevalent motifs in these sequences but would now like to show that the subset has an over abundance of these compared to both control samples and the big list (from which the subset was taken). I can tell by eye but I would like to show that the difference is significant.

What is the simplest and most convincing way to go about this?

I'm currently making a large contingency table but I feel a chi squared test would be too complicated given that the sequence(R) x sample(C) would be >5000 x >50. Is there something simple I am missing?

Thank you!

sequence • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6