Question: How to evaluate the statistical significance of distribution of breakpoints between two datasets.
gravatar for alec_djinn
5.7 years ago by
European Union
alec_djinn340 wrote:

I am studying the distribution of breakpoints among different human genomes looking for hotspots in the "samples" genomes that are enriched in breakpoints. To do so, I have divided the each chromosome in bins of 10Kb and the I have counted how many breaks are present in each bins. I have done the same for some control datasets and for randomly generated datasets. At this point, what is the best statistical test I could use to determine the p value for each bin?

The data I have looks like this:

                              Sample       Control

Breaks_Bin1             10                 3

Breaks_bin2             15                 6

Breaks_bin3               5                 3

statistic • 1.7k views
ADD COMMENTlink modified 5.7 years ago by dariober11k • written 5.7 years ago by alec_djinn340
gravatar for dariober
5.7 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

The way you present the problem it looks like you want to detect differences in counts between conditions. In this case I would look for methods developed for differential gene expression from RNA-Seq (DEseq, edgeR, limma/voom). Your 10kb windows would be "genes" and your break counts would be expression levels. If you don't have replicates of each condition, take care how you interpret the results though. Probably you need to pre-filter your data to remove windows with very low counts in both conditions to cut down the number of tests.

ADD COMMENTlink written 5.7 years ago by dariober11k

Yes, indeed I am trying to detect counts differences between samples and controls. However, since the data comes from different labs, I am looking for a proper statistical approach to validate the findings, to determine whether the difference in counts is significant (p value) or not and I would like to do it using a scipy.stats function or something similar. However I cannot figure out what approach is the best. Chi2, Fisher, Pearson? I am getting different results from each of them and I am not sure which one fit best for my data.

ADD REPLYlink written 5.7 years ago by alec_djinn340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2218 users visited in the last hour