Question

Input and output of picard markduplicate for deepvariant

0

Entering edit mode

2.3 years ago

amy__ ▴ 160

Hi, I'm currently doing a WES pipeline to identify variants in human sequences, currently using (in order of use):

Read QC and trimming: fastq
Alignment: bwa index, bwa mem, samtools view, samtools sort, and samtools index.
Remove PCR duplicates: picard markduplicates?

When it comes to removing PCR duplicates, I have seen that picard's markduplicate works to identify any duplicates.

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam
O=markedduplicates.bam M=markedduplicatesmetrics.txt

However when it comes to removing the PCR duplicates that are found online that just adding REMOVE_DUPLICATES=true removes them?

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam O=removedduplicates.bam M=markedduplicatesmetrics.txt
REMOVE_DUPLICATES=true

The output of this will be a sorted bam file with the removed PCR duplicates?

Would the input for a variant caller like deepvariant, which requires a sorted bam file be this removedduplicates.bam file?

and if so, would it be this removedduplicatessorted.bam file that needs indexing for input into deepvariant rather than the original PE_samtoolssorted.bam?

Thanks! Sorry if confusing. Amy

markduplicates picard bam sam deepvariant • 754 views

ADD COMMENT • link updated 2.3 years ago by jv ★ 1.8k • written 2.3 years ago by amy__ ▴ 160

1

Entering edit mode

According to the documentation, yes, REMOVE_DUPLICATES=true should output an alignment file in which the duplicate reads have been removed. You will then likely need to sort the output and index the sorted file. The sorted file (which doesn't have any duplicate reads) would then be used for further downstream analyses (assuming that that is the appropriate input file for the steps you want to perform).

ADD REPLY • link 2.3 years ago by jv ★ 1.8k