Question: MarkDuplicates output file in GATK pipeline
1
gravatar for gprashant17
11 months ago by
gprashant1770
gprashant1770 wrote:

I have used GATK's MarkDuplicates on a BAM file I obtained after alignment, which resulted in another file marked_duplicates.bam. So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF), or this is just a file containing duplicates? In the latter case, is it possible to obtain a BAM file, with all the duplicates removed?

ADD COMMENTlink modified 11 months ago by Pierre Lindenbaum128k • written 11 months ago by gprashant1770
1
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF),

yes. As a proof, test both files wth samtools flagstats

is it possible to obtain a BAM file, with all the duplicates removed?

in the manual : https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php#--REMOVE_DUPLICATES

--REMOVE_DUPLICATES / NA

If true do not write duplicates to the output file instead of writing them with appropriate flags set.
ADD COMMENTlink written 11 months ago by Pierre Lindenbaum128k

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

ADD REPLYlink written 11 months ago by gprashant1770

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

yes

ADD REPLYlink written 11 months ago by Pierre Lindenbaum128k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 952 users visited in the last hour