How exactly does diffBind count reads?
Entering edit mode
24 days ago
iamjli ▴ 10


I'm working with PE ATAC-seq data and have been using diffBind to generate consensus peaksets with sizes of 401bp centered around the summits. I've noticed a discrepancy when I compare the counts that diffBind reports and counts that I compute using pysam directly on the bams. Here are the parameters that I used for diffBind (v3.0.13):

counts <- dba.count(samples, minOverlap=0.1, score=DBA_SCORE_READS, bRemoveDuplicates=TRUE, bParallel=TRUE)

And for pysam, I use the following filters to remove: unmapped reads, fail QC, is secondary, is unmappped, and is not a proper pair. I only count reads that are fully contained within the 401bp region.

DiffBind consistently reports fewer reads than what I find with my own method. In most cases, the correlations between the reported counts are quite high but not always. Does anyone know what could be causing this difference?


edit: I'm referring to raw counts in case that wasn't clear.

atacseq diffbind • 95 views

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6