Input and output of picard markduplicate for deepvariant
0
0
Entering edit mode
4 months ago
amy__ ▴ 20

Hi, I'm currently doing a WES pipeline to identify variants in human sequences, currently using (in order of use):

1. Read QC and trimming: fastq

2. Alignment: bwa index, bwa mem, samtools view, samtools sort, and samtools index.

3. Remove PCR duplicates: picard markduplicates?

When it comes to removing PCR duplicates, I have seen that picard's markduplicate works to identify any duplicates.

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam
O=markedduplicates.bam M=markedduplicatesmetrics.txt


However when it comes to removing the PCR duplicates that are found online that just adding REMOVE_DUPLICATES=true removes them?

java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam O=removedduplicates.bam M=markedduplicatesmetrics.txt
REMOVE_DUPLICATES=true


The output of this will be a sorted bam file with the removed PCR duplicates?

Would the input for a variant caller like deepvariant, which requires a sorted bam file be this removedduplicates.bam file?

and if so, would it be this removedduplicatessorted.bam file that needs indexing for input into deepvariant rather than the original PE_samtoolssorted.bam?

Thanks! Sorry if confusing. Amy

markduplicates picard bam sam deepvariant • 237 views
1
Entering edit mode

According to the documentation, yes, REMOVE_DUPLICATES=true should output an alignment file in which the duplicate reads have been removed. You will then likely need to sort the output and index the sorted file. The sorted file (which doesn't have any duplicate reads) would then be used for further downstream analyses (assuming that that is the appropriate input file for the steps you want to perform).