Does MarkDuplicates remove duplicate sequence in library?
4.6 years ago
Shixiang ▴ 100

Dear all,

I have a question about sequencing and BQSR mark duplicates which may stupid.

One key step of GATK is MarkDuplicates, which removes duplicates like PCR duplicates. My question is that if there are duplicate segments when construct library? And if MarkDuplicates removes such duplicates and why?

Best, Shixiang

sequencing WES GATK • 1.4k views
4.6 years ago

Hello Shixiang ,

you are mixing things up. BQSR (Base Quality Score Recalibration) and MarkDuplicates are two different things.

1 BQSR (Base Quality Score Recalibration):

The quality values assigned to each base within a read by sequencing machine gets reassigned by new values. These new values are meant to be more correct.

More information about it:

2 MarkDuplicate

During library preparation you have PCR steps resulting in fragments that are copies of one and the same original dna molecule. Based on the most 5' mapping position those duplicates are recognized and only one will be retained. The reason for removing such duplicates is to avoid introducing a bias if one original molecule is overrepresented due to some amplification bias. Note: If your library prep is amplicon based, which means you use pcr to get your target region, do not remove duplicates. Because all your reads are virtually duplicates.

fin swimmer

Thanks, I have corrected my question. My data is WES (including tumor and normal) download from NCBI, I use it for mutation calling and copy number calling. Should I remove duplicates?


