Question: MarkDuplicates output file in GATK pipeline
1
gravatar for gprashant17
20 months ago by
gprashant1770
gprashant1770 wrote:

I have used GATK's MarkDuplicates on a BAM file I obtained after alignment, which resulted in another file marked_duplicates.bam. So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF), or this is just a file containing duplicates? In the latter case, is it possible to obtain a BAM file, with all the duplicates removed?

ADD COMMENTlink modified 20 months ago by Pierre Lindenbaum134k • written 20 months ago by gprashant1770
1
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF),

yes. As a proof, test both files wth samtools flagstats

is it possible to obtain a BAM file, with all the duplicates removed?

in the manual : https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php#--REMOVE_DUPLICATES

--REMOVE_DUPLICATES / NA

If true do not write duplicates to the output file instead of writing them with appropriate flags set.
ADD COMMENTlink written 20 months ago by Pierre Lindenbaum134k

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

ADD REPLYlink written 20 months ago by gprashant1770

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

yes

ADD REPLYlink written 20 months ago by Pierre Lindenbaum134k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2265 users visited in the last hour
_