Is there a tool that removes PCR duplicates from a BAM file if they are above the background levels of the region? I don't mean remove all PCR duplicates just by position; sometimes you have two reads with identical positions originating from two molecules legitimately. I mean cases where there is extreme pileup of one position that sticks out greatly over background positions and therefore should be removed (either completely or reduce to the "average" coverage of the region). Is there a tool that does that efficiently for BAM files? If I understand samtools rmdup
correctly, it will remove any reads that start in the same position and consider those duplicates, which is not the behavior I'm looking for.
Question: Removing Pcr Duplicates From Bam Without Collapsing Sequences By Position?
1
user • 870 wrote:
0
Pavel Senin • 1.9k wrote:
I think that Picard's MarkDuplicates will help.http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates. (sometimes, including the "VALIDATION_STRINGENCY=LENIENT" (or SILENT) option helps to get things done.)
0
William • 4.7k wrote:
Picard removes all but one of the reads that have the exact same start and stop of the alignment and cigar string (mismatches, indels).
I don't know of any tools that try to be more sophisticated with removing duplicated. Removing duplicates like Picard does should not cause you any problems as long as you have a sample of good complexity (ie all your positions positions of interest on the reference are covered by multiple reads starting or ending on different positions).
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1637 users visited in the last hour