Question: Samtools Remove Duplicates Question
1
gravatar for shilpy
6.7 years ago by
shilpy10
shilpy10 wrote:

Why do we remove duplicates from BAM files while using Samtools? When we have paired end data we can remove duplicates as a fragment OR as a pair. How do each of these methods differ?

samtools • 11k views
ADD COMMENTlink modified 6.7 years ago by Ashutosh Pandey11k • written 6.7 years ago by shilpy10

On seqanswer there is a thread that could interest: Samtools's rmdup vs. Picard's MarkDuplicates:

http://seqanswers.com/forums/showthread.php?t=5424

ADD REPLYlink written 6.7 years ago by ff.cc.cc1.3k
2
gravatar for Ashutosh Pandey
6.7 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I would personally recommend using Picard for marking or removing duplicates. If you have a paired data, then both reads for a pair will be used to select duplicates. In this case, if there is another pair that has both of its reads aligning at the same exact location as this pair, then one of these would be marked as duplicates. For fragment reads, location of only one read will be used to mark the duplicates.

ADD COMMENTlink written 6.7 years ago by Ashutosh Pandey11k

You mean to say in both the cases only one will be marked as duplicate. I am sorry as I just started using samtool in Next Generation Sequencing.

ADD REPLYlink written 6.7 years ago by shilpy10
1

Here is a example. Assuming for fragment data, there are 5 reads that align exactly at the same location. 4 of them will be marked duplicates and 1 of them will be kept for further use. The best read (least mismatches or best mapping quality) will be chosen by Picard or samtools mark duplicate module so you dont need to worry about it. Also, marking duplicates is done at library level. so if you have two libraries their duplicates will be marked separately. If there are reads in two libraries that align at the same position then wont be marked as duplicates.

ADD REPLYlink written 6.7 years ago by Ashutosh Pandey11k

Thank you very much for clarification!

ADD REPLYlink written 6.7 years ago by shilpy10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 922 users visited in the last hour