Removing Pcr Duplicates From Bam Without Collapsing Sequences By Position?
2
1
Entering edit mode
11.1 years ago
user ▴ 940

Is there a tool that removes PCR duplicates from a BAM file if they are above the background levels of the region? I don't mean remove all PCR duplicates just by position; sometimes you have two reads with identical positions originating from two molecules legitimately. I mean cases where there is extreme pileup of one position that sticks out greatly over background positions and therefore should be removed (either completely or reduce to the "average" coverage of the region). Is there a tool that does that efficiently for BAM files? If I understand samtools rmdup correctly, it will remove any reads that start in the same position and consider those duplicates, which is not the behavior I'm looking for.

bam samtools next-gen rna-seq pcr alignment • 4.6k views
ADD COMMENT
0
Entering edit mode
11.1 years ago
Pavel Senin ★ 1.9k

I think that Picard's MarkDuplicates will help.http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates. (sometimes, including the "VALIDATION_STRINGENCY=LENIENT" (or SILENT) option helps to get things done.)

ADD COMMENT
0
Entering edit mode
11.1 years ago
William ★ 5.3k

Picard removes all but one of the reads that have the exact same start and stop of the alignment and cigar string (mismatches, indels).

I don't know of any tools that try to be more sophisticated with removing duplicated. Removing duplicates like Picard does should not cause you any problems as long as you have a sample of good complexity (ie all your positions positions of interest on the reference are covered by multiple reads starting or ending on different positions).

ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6