Interpret Bedtools Overlap?
1
0
Entering edit mode
3.1 years ago
kstangline ▴ 80

I have a fairly simple question regarding bedtools.

I've been asked to find the intersections between two types of sample peaks (ChIP-seq peaks). The goal is to see if they're similar or not (i.e. can we use our new method if it gives similar results/peaks to our old method).

I've used the following formula to find the reproducibility:

bedtools intersect -u -a sample1.bed -b sample2.bed -wa | wc -l

I then took the intersection value and divided it by the total of -a (sample1) to get the reproducibility rate. In other words, I'm showing the % of peaks in sample 1 that are reproduced in sample 2.

How would I interpret these results to a wet lab scientist if the reproducibility (overlap) is > 60%?

From my understanding, anything > than 60% (overlap) reproducibility is considered a good score because it's less likely to have occurred by chance?

Would I need to calculate a p value to show that there is a really good overlap?

bed • 710 views
ADD COMMENT
1
Entering edit mode
3.1 years ago

Think about it this way: Is 50% a surprising chance to win a coin toss? How about having a 50% chance to win the lottery?

The point I am trying to make is that the value of an observation, interpreted as novel information, relates to how unlikely (aka informative) it is.

60% is only meaningful if you also knew how unlikely it was to get 60% by chance alone. In a sense, that likelihood is what p-values try to capture.

In your case, you would need to quantify how likely is that you could get 60% overlap even if the phenomena of interest (that you associate with overlap) would not be present. Or what fraction would overlap if you picked ChIP-seq data for similar tissues and states but conditions that contradict your hypotheses. With that, you can build up a confidence level as to what is credible overlap and what are accidental, systemic similarities.

Also, I would not call this "reproducibility", that means something else in my opinion. What you observe is replication, your replicates recapitulate some but not all the information.

ADD COMMENT

Login before adding your answer.

Traffic: 3195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6