Question: Enrichment of genomic feature in DNA breakpoints
0
nr2380 wrote:

I have identified a number of structural variant breakpoints across multiple tumour normal comparisons.

I want to ask the question "How enriched for a particular genomic feature is my set of breakpoints?"

For example, across all samples we find a total of 200 breakpoints, and 15 of these are found in the same class of genomic feature (e.g. exon).

If the genome is 137547960 bps long, and the total fraction of the genome that is exonic is 21.8% (30095000/137547960), then I would expect to find 43.8/200 breakpoints in exons ( 200*(total_exon_length/total_genome) ) across all my samples.That we find only 15, suggests that this feature class is underrepresented in our breakpoint set.

Is this the right way of going about this sort of test? What is an appropriate statistic to use here? Chi-Squared or Fisher's Exact?

next-gen genome • 715 views
modified 2.3 years ago • written 2.3 years ago by nr2380

You could try a binomial with a prob of success of a single trial =0.218 ; number of trials = 200 and number of success = 15.

In R :

> binom.test(x = 15,n=200,p = 0.218,alternative = "less")

Exact binomial test

data:  15 and 200
number of successes = 15, number of trials = 200, p-value = 4.255e-08
alternative hypothesis: true probability of success is less than 0.218
95 percent confidence interval:
0.0000000 0.1131384
sample estimates:
probability of success
0.075
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Nicolas Rosewick8.3k

OK thanks for the advice. How should I interpret the output from R?

If I want to test for enrichment of a particular class of feature, shouldn't I use alternative = "greater"?

For an enrichment you should indeed used "greater"