Enrichment of genomic feature in DNA breakpoints
0
0
Entering edit mode
5.4 years ago
nanana ▴ 110

I have identified a number of structural variant breakpoints across multiple tumour normal comparisons.

I want to ask the question "How enriched for a particular genomic feature is my set of breakpoints?"

For example, across all samples we find a total of 200 breakpoints, and 15 of these are found in the same class of genomic feature (e.g. exon).

If the genome is 137547960 bps long, and the total fraction of the genome that is exonic is 21.8% (30095000/137547960), then I would expect to find 43.8/200 breakpoints in exons ( 200*(total_exon_length/total_genome) ) across all my samples.That we find only 15, suggests that this feature class is underrepresented in our breakpoint set.

Is this the right way of going about this sort of test? What is an appropriate statistic to use here? Chi-Squared or Fisher's Exact?

genome next-gen • 1.3k views
0
Entering edit mode

You could try a binomial with a prob of success of a single trial =0.218 ; number of trials = 200 and number of success = 15.

In R :

> binom.test(x = 15,n=200,p = 0.218,alternative = "less")

Exact binomial test

data:  15 and 200
number of successes = 15, number of trials = 200, p-value = 4.255e-08
alternative hypothesis: true probability of success is less than 0.218
95 percent confidence interval:
0.0000000 0.1131384
sample estimates:
probability of success
0.075

0
Entering edit mode

OK thanks for the advice. How should I interpret the output from R?

0
Entering edit mode

If I want to test for enrichment of a particular class of feature, shouldn't I use alternative = "greater"?

0
Entering edit mode

For an enrichment you should indeed used "greater"

0
Entering edit mode

Yes enrichment is "greater". "Less" is for the inverse H0 thus under representation