How exactly does diffBind count reads?
Entering edit mode
6 months ago
iamjli ▴ 10


I'm working with PE ATAC-seq data and have been using diffBind to generate consensus peaksets with sizes of 401bp centered around the summits. I've noticed a discrepancy when I compare the counts that diffBind reports and counts that I compute using pysam directly on the bams. Here are the parameters that I used for diffBind (v3.0.13):

counts <- dba.count(samples, minOverlap=0.1, score=DBA_SCORE_READS, bRemoveDuplicates=TRUE, bParallel=TRUE)

And for pysam, I use the following filters to remove: unmapped reads, fail QC, is secondary, is unmappped, and is not a proper pair. I only count reads that are fully contained within the 401bp region.

DiffBind consistently reports fewer reads than what I find with my own method. In most cases, the correlations between the reported counts are quite high but not always. Does anyone know what could be causing this difference?


edit: I'm referring to raw counts in case that wasn't clear.

atacseq diffbind • 355 views
Entering edit mode
5 months ago
Rory Stark ★ 1.2k

Can you let us know which version of DiffBind you are using? The counting methods, and specifically the defaults, have changed a bit over time.

I'm a bit surprised that DiffBind reports fewer reads as it should include all reads that overlap the regions at all, not just those fully contained in the consensus intervals. I notice you didn't say you are filtering duplicates in pysam, could that account for it?

Entering edit mode

I'm using v3.0.13. I have been filtering duplicates.


Login before adding your answer.

Traffic: 1341 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6