Question

obtaining unique peaks in DiffBind

1

Entering edit mode

5.8 years ago

Illinu ▴ 110

Hi,

I am using DiffBind for ChIP-Seq differential binding analysis. I want to use the occupancy analysis to extract peaks that are unique to one group and are absent in the other. I run the following command and I understand that the analysis outputs peaks that are present in at least 70% of the samples of each group. How can I get the peaks that are 100% present in one group while being 0% present in the other?

rc=dba.peakset(rc,consensus= DBA_TREATMENT,minOverlap=0.7)

Thanks Sol

DiffBind ChIP-Seq differential • 4.7k views

ADD COMMENT • link 5.6 years ago by Illinu ▴ 110

score 2 · Answer 1 · 2018-07-20

DiffBind does not really define peaks as "present" or "absent". It quantifies the peaks you input across the different samples. You can use dba.analyze() to find regions where the binding is significantly different between the groups of interest. Those are the peaks that you could consider unique to one group.

An alternate approach would be to use something like bedtools to overlap BED files from different samples (with bedtools intersect). That would give peaks that are called in one set of samples and not the other. However, it's still possible that there is still signal at those loci, it's just not sufficient to pass the threshold set by your peak caller. This is where DiffBind is useful, since it will actually quantify the difference.

score 2 · Answer 2 · 2018-07-28

As @igor mentioned, you can get 100% present in one group while being 0% present in the other using bedtools intersect. However, this binarization approach might not be a good solution since there are false positives and true negatives in peak calling. You can use multiple differential peak callers and look at the signal differences between the groups. Other differential peak callers you can use in R are QSEA, edgeR etc.

score 2 · Answer 3 · 2018-08-07

Hi,

If you follow the DiffBind vignette, you will find commands for obtaining only unique peaks (peaks in one sample and not the other).

For example, after you run your command (rc=dba.peakset(rc,consensus= DBA_TREATMENT,minOverlap=0.7)), you can run the following commands to get unique peaks (it's in the DiffBind Vignette).

#make venn plots of unique and common peaks
dba.plotVenn(rc, rc$masks$Consensus)

rc.OL <- dba.overlap(rc,rc$masks$Consensus)

#peaks only in group !
unique_peakset_groupA <- rc.OL$onlyA

#only ikras on
unique_peakset_groupB <- rc.OL$onlyB

Also the peak set that you give to dba.peakset should be the the original peak set (before reading in the count data).

score 0 · Answer 4 · 2018-09-03

Hi there, and thank you all for your very helpful answers. I thought of an approach for this problem and wanted to get your input. I think the trick is using the min.overlap option. If I understood well, setting a minimum overlap of 70% (min.overlap = 0.7) means that the peaks would be considered unique when they are present in more than 70% of the samples in one group and less than 70% of the samples in the other group, so I guess that setting the min.overlap option to 1/n (n=number of samples) would give what I am looking for.

So for example, if there are four replicates per group and I set min.ovrerlap=0.25, the peaks that are present in more than 25% of the samples in one group (at least 1 out of 4 samples) and in less than 25% of the samples in the other group (none) would meet the criteria of uniqueness.

Am I wrong?

Thanks Sol

peakset <- dba.peakset(ta, consensus = DBA_TREATMENT, minOverlap = 0.25)