Question

Testing for over-representation of chip-peaks

1

Entering edit mode

8.4 years ago

Bioradical ▴ 60

I am trying to test for over-representation of a set of overlapped chip-seq peaks between constitutive exons and alternatively spliced exons.

I have a bed file that contains my overlapped chip peaks, a bed file that contains constitutive exons and a bed file that contains alternatively spliced exons (generated using a custom script provided by some authors of a paper). I am interested in running a statistical test that tells me whether my overlapped peaks are over-represented / enriched in my constitutive exon file, or alternatively spliced exon file individually.

I thought about running a hypergeometric test using the phyper function in R. But I'm not quite sure what numbers I would use specifically.

I also attempted to use the bedtools fisher test by using my overlapped chip-seq peak file and testing that against my con exon file and then my alt exon file seperately. This returned a p-value of 0 for both which I guess doesn't make much sense (though I am not very math-oriented). I mostly work on wet-lab stuff as an assistant.

Any help is appreciated.

Overrepresentation ChIP-Seq R Stats • 2.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by Bioradical ▴ 60

score 4 · Accepted Answer · 2017-03-03

You can use the R/Bioconductor package regioneR for this. It implements a statistical test for the association of genomic regions (such as chip peaks and exons) based on random permutations.

In this case I think the best aproach would be to "flip" the question and ask whether alternatively spliced (or constitutive) exons tend to be associated with the chip peaks and use the "resampling" randomization strategy.

For example (untested code!)

library(regioneR)

chip.peaks <- toGRanges("chip.peaks.bed")
alt.exons <- toGRanges("alt.exons.bed")
const.exons <- toGRanges("const.exons.bed")

all.exons <- c(alt.exons, const.exons)

pt <- permTest(A=alt.exons, B=chip.peaks, universe=all.exons, 
           randomize.function = resampleRegions, evaluate.function = numOverlaps, 
           ntimes = 1000)

pt
plot(pt)

This will create 1000 random sets of exons and test if the alt.exons are more associated with the peaks than one could expect by chance.

You can find more information about how to use regioneR and about permutation tests in the package vignette.