Question: Inconsistency between number of reads in Samtools flagstat and Macs2 total tags.
0
gravatar for ahmadammarmalik
15 months ago by
ahmadammarmalik0 wrote:

I am trying to call peaks on my ATAC data using MACS2 and realized that there is a huge inconsistency within the number of reads in my bam file and the total tags reported by MACS2.

I have seen previous posts that mention using "--keep-dup 'all'" option but I have deduplicated my bam using picard before macs. Below is the command am using and the outputs for macs and samtools flagstat.

macs2 callpeak -t PrimaryKeratinocytes.suf.bam -g hs -n test --nomodel --shift -51 --extsize 102 --q 0.05 --keep-dup 'all'

macs2

INFO  @ Fri, 17 Aug 2018 14:53:58: #1 tag size is determined as 51 bps 
INFO  @ Fri, 17 Aug 2018 14:53:58: #1 tag size = 51 
INFO  @ Fri, 17 Aug 2018 14:53:58: #1  total tags in treatment: 37171037

flagstat

52164544 + 0 in total (QC-passed reads + QC-failed reads)
14993507 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
52164544 + 0 mapped (100.00% : N/A)

Why is macs using only 37 million reads? How can I include all the reads for peak calling?

Thank you

ADD COMMENTlink modified 15 months ago by genomax74k • written 15 months ago by ahmadammarmalik0

Thank you for the editing @genomax

ADD REPLYlink written 15 months ago by ahmadammarmalik0
0
gravatar for swbarnes2
15 months ago by
swbarnes26.9k
United States
swbarnes26.9k wrote:

How many reads did you actually have after deduping?

Did you observe that 52164544-1499307 = 37171037

I'm guessing you really only have 37171037 reads, and 14993507 aligned to multiple places, so are in your file multiple times

ADD COMMENTlink written 15 months ago by swbarnes26.9k

The dedup bam file has 52164544 reads which are all mapped. I preprocess the bam files to keep only the unique reads that are mapped to the chromosome.

The flagstat shows this "52164544 + 0 mapped (100.00% : N/A)"

The way I preprocess the bam I know for sure it has 52 million reads unique mapped reads but the macs is only seeing 37171037.

ADD REPLYlink written 15 months ago by ahmadammarmalik0

samtools view myfile.bam | cut -f 1 | sort | uniq -c | wc -l Do you get 37M or 52M?

ADD REPLYlink written 15 months ago by swbarnes26.9k

I get 41121155 reads. Still a few million more than what macs2 is reporting

ADD REPLYlink written 14 months ago by ahmadammarmalik0

I checked another bam and still, the number of reads are very inconsistent:

flagstat: 9613009

samtools view myfile.bam | cut -f 1 | sort | uniq -c | wc -l: 5987026

MAC2 total tags: 3252338

Is there a way to keep all the undup tags in my analysis?

ADD REPLYlink modified 14 months ago • written 14 months ago by ahmadammarmalik0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour