How exactly does diffBind count reads?
1
0
Entering edit mode
6 months ago
iamjli ▴ 10

Hello!

I'm working with PE ATAC-seq data and have been using diffBind to generate consensus peaksets with sizes of 401bp centered around the summits. I've noticed a discrepancy when I compare the counts that diffBind reports and counts that I compute using pysam directly on the bams. Here are the parameters that I used for diffBind (v3.0.13):

counts <- dba.count(samples, minOverlap=0.1, score=DBA_SCORE_READS, bRemoveDuplicates=TRUE, bParallel=TRUE)

And for pysam, I use the following filters to remove: unmapped reads, fail QC, is secondary, is unmappped, and is not a proper pair. I only count reads that are fully contained within the 401bp region.

DiffBind consistently reports fewer reads than what I find with my own method. In most cases, the correlations between the reported counts are quite high but not always. Does anyone know what could be causing this difference?

Thanks!

edit: I'm referring to raw counts in case that wasn't clear.

atacseq diffbind • 355 views
0
Entering edit mode
5 months ago
Rory Stark ★ 1.2k

Can you let us know which version of DiffBind you are using? The counting methods, and specifically the defaults, have changed a bit over time.

I'm a bit surprised that DiffBind reports fewer reads as it should include all reads that overlap the regions at all, not just those fully contained in the consensus intervals. I notice you didn't say you are filtering duplicates in pysam, could that account for it?

0
Entering edit mode

I'm using v3.0.13. I have been filtering duplicates.