How many reads will be removed after markduplicates in general?
0
0
Entering edit mode
5 weeks ago
Sashu ▴ 10

Hello! Now I'm dealing with bulk DNA-seq data.

Firstly, I aligned it to human genome by bwa-mem.

Before MarkDuplicates by picard with only required options(I, O, M), its mean depth of total region is 34.

However, after Markduplicates, its mean depth of total region is only 18! Moreover, after MarkDuplicates, its bam file is only 1G smaller. Why has the depth descreased so much? The depth is calculated by mosdepth (mosdepth -t 4 -n).

MarkDuplicates • 493 views
1
Entering edit mode

Why has the depth descreased so much?

run samtools flagstats to count the number of marked/total reads.

1
Entering edit mode

Did you simply mark or actually removed the duplicates? BAM file could have shrunk if reads simply got rearranged.

0
Entering edit mode

or the compression level has changed ...

0
Entering edit mode

I realize that I only identified the duplicates.

0
Entering edit mode

If I forget to remove duplicates, will it influences the SNPs called by gatk HaplotypeCaller?

1
Entering edit mode

will it influences the SNPs called by gatk HaplotypeCaller?

These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

0
Entering edit mode

0
Entering edit mode

Returning to the original question， would you think it's strange that the depth of bulk DNA-seq data decreases too much after markduplicates?