Samtools Remove Duplicates Question
1
2
Entering edit mode
11.1 years ago
shilpy ▴ 20

Why do we remove duplicates from BAM files while using Samtools? When we have paired end data we can remove duplicates as a fragment OR as a pair. How do each of these methods differ?

samtools • 17k views
ADD COMMENT
0
Entering edit mode

On seqanswer there is a thread that could interest: Samtools's rmdup vs. Picard's MarkDuplicates:

http://seqanswers.com/forums/showthread.php?t=5424

ADD REPLY
2
Entering edit mode
11.1 years ago

I would personally recommend using Picard for marking or removing duplicates. If you have a paired data, then both reads for a pair will be used to select duplicates. In this case, if there is another pair that has both of its reads aligning at the same exact location as this pair, then one of these would be marked as duplicates. For fragment reads, location of only one read will be used to mark the duplicates.

ADD COMMENT
0
Entering edit mode

You mean to say in both the cases only one will be marked as duplicate. I am sorry as I just started using samtool in Next Generation Sequencing.

ADD REPLY
1
Entering edit mode

Here is a example. Assuming for fragment data, there are 5 reads that align exactly at the same location. 4 of them will be marked duplicates and 1 of them will be kept for further use. The best read (least mismatches or best mapping quality) will be chosen by Picard or samtools mark duplicate module so you dont need to worry about it. Also, marking duplicates is done at library level. so if you have two libraries their duplicates will be marked separately. If there are reads in two libraries that align at the same position then wont be marked as duplicates.

ADD REPLY
0
Entering edit mode

Thank you very much for clarification!

ADD REPLY

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6