Question: MarkDuplicates output file in GATK pipeline
1
gravatar for gprashant17
4 weeks ago by
gprashant1760
gprashant1760 wrote:

I have used GATK's MarkDuplicates on a BAM file I obtained after alignment, which resulted in another file marked_duplicates.bam. So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF), or this is just a file containing duplicates? In the latter case, is it possible to obtain a BAM file, with all the duplicates removed?

ADD COMMENTlink modified 4 weeks ago by Pierre Lindenbaum121k • written 4 weeks ago by gprashant1760
1
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF),

yes. As a proof, test both files wth samtools flagstats

is it possible to obtain a BAM file, with all the duplicates removed?

in the manual : https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php#--REMOVE_DUPLICATES

--REMOVE_DUPLICATES / NA

If true do not write duplicates to the output file instead of writing them with appropriate flags set.
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum121k

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

ADD REPLYlink written 4 weeks ago by gprashant1760

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

yes

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour