Proving distribution differences in large datasets

0

Entering edit mode

9.1 years ago

surka • 0

Hello,

I have a question about analysis of motif occurrence in a dataset. I have a big set of sequences which were pulled out of a subset of data as fulfilling criteria x. I determined the most prevalent motifs in these sequences but would now like to show that the subset has an over abundance of these compared to both control samples and the big list (from which the subset was taken). I can tell by eye but I would like to show that the difference is significant.

What is the simplest and most convincing way to go about this?

I'm currently making a large contingency table but I feel a chi squared test would be too complicated given that the sequence(R) x sample(C) would be >5000 x >50. Is there something simple I am missing?

Thank you!

sequence • 1.4k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.1 years ago by surka • 0

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 3327 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6