DiffBind missing peaks and FRiP
2
0
Entering edit mode
2.1 years ago
Christine ▴ 20

I noticed that when I first read in my sample sheet, I get a DBA object with 1226 sites (consensus peaks), yet after running dba.count() on this same object, it is down to 1188 sites. I did not apply any blacklists/greylists.

Also, the FRiP column is missing after I run dba.count(). Does anyone have ideas on what could have gone wrong?

There are 5 biological replicates of wildtype and 6 of mutant, but only one sequencing run was done per biological replicate (so no technical replicates). Peaks were called with MACS2 --broad. I made .bed files from the first 6 columns of the .broadPeak output.

> WS
11 Samples, 1226 sites in matrix (13233 total):
    ID Condition Replicate Intervals
1  705       Mut         1      1284
2  709        WT         1       319
3  719       Mut         1      8909
4  720       Mut         1      1994
5  721        WT         1         6
6  722        WT         1        15
7  728        WT         1        49
8  729       Mut         1        38
9  739       Mut         1         0
10 744        WT         1      2542
11 748       Mut         1         1
> plot(WS)
> WS <- dba.count(WS)
Computing summits...
Re-centering peaks...
> WS
11 Samples, 1188 sites in matrix:
    ID Condition Replicate    Reads
1  705       Mut         1 18209602
2  709        WT         1 17207846
3  719       Mut         1 14440635
4  720       Mut         1 19380340
5  721        WT         1 14392339
6  722        WT         1 16246924
7  728        WT         1 18220300
8  729       Mut         1 16485420
9  739       Mut         1 15516176
10 744        WT         1 16693032
11 748       Mut         1 15745849
DiffBind ChIP-seq • 2.9k views
ADD COMMENT
1
Entering edit mode

dba.count will filter peaks who's "cpm < 1" (dba.count(filter = 1)) by default, this may decreasing peak numbers.

ADD REPLY
1
Entering edit mode
2.1 years ago
Rory Stark ★ 2.0k

As MatthewP suggested, filtering can remove peak intervals with low overlapping read counts. Try calling dba.count() with filter=0 to compare how many peaks are filtered in this way.

Less commonly, it is possible for peaks to be merged when re-centering around summits. This happens if peaks overlap after being recentered (and having 401bp widths using the default setting of summits=200.) In this case the overlapping peaks are merged into a single peak which is them re-centered around the consensus summit in that region. You can try running dba.count() with summits=FALSE to see is any peaks are being merged in this way.

Running dba.count() with both filter=0 and summits=FALSE should result in the same number of peaks before and after counting (using the default consensus of minOverlap=2).

I have checked in a fix for the intermittent disappearing FRiP issue, which should be in effect from DiffBind_3.4.11 onwards.

ADD COMMENT
0
Entering edit mode

Thank you so much! I got the same number of peaks with the parameters filters=0 and summits=FALSE.

Regarding the FRiP issue, is there a workaround that I could use to retrieve these values until DiffBind_3.4.11 is released? It is a bit strange because I am able to get the FRiP column when running the tamoxifen vignette, so I just want to be sure it is not due to some issue in my own data.

ADD REPLY
1
Entering edit mode

DiffBind_3.4.11 should be available within the week, but here is a "backdoor" workaround in the meantime:

WS <- dba.count(WS)
WS$SN <- DiffBind:::pv.Signal2Noise(WS)

(Don't tell anyone, it's a secret)

ADD REPLY
1
Entering edit mode

That's a quick turnaround, thank you so much - looking forward to the new release!

Unfortunately I keep getting NULL when I run that workaround line, but was thinking if I do

WS.norm <- dba.normalize(WS, normalize=DBA_NORM_LIB, library=DBA_LIBSIZE_PEAKREADS, background=FALSE)
norm <- dba.normalize(WS.norm, bRetrieve = TRUE)

If I'm understanding correctly, lib.sizes should then give me the number of reads falling within the peaks, and I can divide by the total read count...?

Also, if using spikein=TRUE, does lib.sizes then include the Drosophila reads? Looks like lib.sizes is gaining reads somehow when I run with spikein=TRUE.

ADD REPLY
0
Entering edit mode

This is a clever way to get the reads in peak numbers, and a better workaround than my suggestion using an internal function!

Yes, the lib.sizes values include the spike-in reads as well as the primary reads. Note that if you use normalize=DBA_NORM_LIB with spike-in, only the spike-in library sizes will be used.

ADD REPLY
0
Entering edit mode
12 months ago
gyspace • 0

Hi, I also can not get FRiP score through dba.count. Have you solved the problem?

ADD COMMENT
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6