Hi, I'm currently doing a WES pipeline to identify variants in human sequences, currently using (in order of use):
Read QC and trimming: fastq
Alignment: bwa index, bwa mem, samtools view, samtools sort, and samtools index.
Remove PCR duplicates: picard markduplicates?
When it comes to removing PCR duplicates, I have seen that picard's markduplicate works to identify any duplicates.
java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam O=markedduplicates.bam M=markedduplicatesmetrics.txt
However when it comes to removing the PCR duplicates that are found online that just adding REMOVE_DUPLICATES=true removes them?
java -jar MarkDuplicates.jar I=PE_samtoolssorted.bam O=removedduplicates.bam M=markedduplicatesmetrics.txt REMOVE_DUPLICATES=true
The output of this will be a sorted bam file with the removed PCR duplicates?
Would the input for a variant caller like deepvariant, which requires a sorted bam file be this removedduplicates.bam file?
and if so, would it be this removedduplicatessorted.bam file that needs indexing for input into deepvariant rather than the original PE_samtoolssorted.bam?
Thanks! Sorry if confusing. Amy