I have used GATK's MarkDuplicates on a BAM file I obtained after alignment, which resulted in another file marked_duplicates.bam. So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF), or this is just a file containing duplicates? In the latter case, is it possible to obtain a BAM file, with all the duplicates removed?
Question: MarkDuplicates output file in GATK pipeline
1
gprashant17 • 70 wrote:
ADD COMMENT
• link
•
modified 20 months ago
by
Pierre Lindenbaum ♦ 134k
•
written
20 months ago by
gprashant17 • 70
1
Pierre Lindenbaum ♦ 134k wrote:
So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF),
yes. As a proof, test both files wth samtools flagstats
is it possible to obtain a BAM file, with all the duplicates removed?
in the manual : https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php#--REMOVE_DUPLICATES
--REMOVE_DUPLICATES / NA
If true do not write duplicates to the output file instead of writing them with appropriate flags set.
So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 2265 users visited in the last hour