Question: Removing Pcr Duplicates From Bam Without Collapsing Sequences By Position?
1
gravatar for user
6.2 years ago by
user790
United States
user790 wrote:

Is there a tool that removes PCR duplicates from a BAM file if they are above the background levels of the region? I don't mean remove all PCR duplicates just by position; sometimes you have two reads with identical positions originating from two molecules legitimately. I mean cases where there is extreme pileup of one position that sticks out greatly over background positions and therefore should be removed (either completely or reduce to the "average" coverage of the region). Is there a tool that does that efficiently for BAM files? If I understand samtools rmdup correctly, it will remove any reads that start in the same position and consider those duplicates, which is not the behavior I'm looking for.

ADD COMMENTlink modified 6.2 years ago by William4.4k • written 6.2 years ago by user790
0
gravatar for Pavel Senin
6.2 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

I think that Picard's MarkDuplicates will help.http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates. (sometimes, including the "VALIDATION_STRINGENCY=LENIENT" (or SILENT) option helps to get things done.)

ADD COMMENTlink written 6.2 years ago by Pavel Senin1.9k
0
gravatar for William
6.2 years ago by
William4.4k
Europe
William4.4k wrote:

Picard removes all but one of the reads that have the exact same start and stop of the alignment and cigar string (mismatches, indels).

I don't know of any tools that try to be more sophisticated with removing duplicated. Removing duplicates like Picard does should not cause you any problems as long as you have a sample of good complexity (ie all your positions positions of interest on the reference are covered by multiple reads starting or ending on different positions).

ADD COMMENTlink written 6.2 years ago by William4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1248 users visited in the last hour