Inconsistency between number of reads in Samtools flagstat and Macs2 total tags.
1
0
Entering edit mode
5.7 years ago

I am trying to call peaks on my ATAC data using MACS2 and realized that there is a huge inconsistency within the number of reads in my bam file and the total tags reported by MACS2.

I have seen previous posts that mention using "--keep-dup 'all'" option but I have deduplicated my bam using picard before macs. Below is the command am using and the outputs for macs and samtools flagstat.

macs2 callpeak -t PrimaryKeratinocytes.suf.bam -g hs -n test --nomodel --shift -51 --extsize 102 --q 0.05 --keep-dup 'all'

macs2

INFO  @ Fri, 17 Aug 2018 14:53:58: #1 tag size is determined as 51 bps 
INFO  @ Fri, 17 Aug 2018 14:53:58: #1 tag size = 51 
INFO  @ Fri, 17 Aug 2018 14:53:58: #1  total tags in treatment: 37171037

flagstat

52164544 + 0 in total (QC-passed reads + QC-failed reads)
14993507 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
52164544 + 0 mapped (100.00% : N/A)

Why is macs using only 37 million reads? How can I include all the reads for peak calling?

Thank you

ChIP-Seq macs2 ATAC peak calling • 1.8k views
ADD COMMENT
0
Entering edit mode

Thank you for the editing @genomax

ADD REPLY
0
Entering edit mode
5.7 years ago

How many reads did you actually have after deduping?

Did you observe that 52164544-1499307 = 37171037

I'm guessing you really only have 37171037 reads, and 14993507 aligned to multiple places, so are in your file multiple times

ADD COMMENT
0
Entering edit mode

The dedup bam file has 52164544 reads which are all mapped. I preprocess the bam files to keep only the unique reads that are mapped to the chromosome.

The flagstat shows this "52164544 + 0 mapped (100.00% : N/A)"

The way I preprocess the bam I know for sure it has 52 million reads unique mapped reads but the macs is only seeing 37171037.

ADD REPLY
0
Entering edit mode

samtools view myfile.bam | cut -f 1 | sort | uniq -c | wc -l Do you get 37M or 52M?

ADD REPLY
0
Entering edit mode

I get 41121155 reads. Still a few million more than what macs2 is reporting

ADD REPLY
0
Entering edit mode

I checked another bam and still, the number of reads are very inconsistent:

flagstat: 9613009

samtools view myfile.bam | cut -f 1 | sort | uniq -c | wc -l: 5987026

MAC2 total tags: 3252338

Is there a way to keep all the undup tags in my analysis?

ADD REPLY

Login before adding your answer.

Traffic: 2605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6