How exactly does diffBind count reads?
1
0
Entering edit mode
3.0 years ago
iamjli ▴ 10

Hello!

I'm working with PE ATAC-seq data and have been using diffBind to generate consensus peaksets with sizes of 401bp centered around the summits. I've noticed a discrepancy when I compare the counts that diffBind reports and counts that I compute using pysam directly on the bams. Here are the parameters that I used for diffBind (v3.0.13):

counts <- dba.count(samples, minOverlap=0.1, score=DBA_SCORE_READS, bRemoveDuplicates=TRUE, bParallel=TRUE)

And for pysam, I use the following filters to remove: unmapped reads, fail QC, is secondary, is unmappped, and is not a proper pair. I only count reads that are fully contained within the 401bp region.

DiffBind consistently reports fewer reads than what I find with my own method. In most cases, the correlations between the reported counts are quite high but not always. Does anyone know what could be causing this difference?

Thanks!

edit: I'm referring to raw counts in case that wasn't clear.

atac-seq diffbind • 1.2k views
ADD COMMENT
0
Entering edit mode
3.0 years ago
Rory Stark ★ 2.0k

Can you let us know which version of DiffBind you are using? The counting methods, and specifically the defaults, have changed a bit over time.

I'm a bit surprised that DiffBind reports fewer reads as it should include all reads that overlap the regions at all, not just those fully contained in the consensus intervals. I notice you didn't say you are filtering duplicates in pysam, could that account for it?

ADD COMMENT
0
Entering edit mode

I'm using v3.0.13. I have been filtering duplicates.

ADD REPLY

Login before adding your answer.

Traffic: 2770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6