3.4 years ago by
Republic of Ireland
Picard MarkDuplicates only identifies and marks optical/PCR duplicates from an alignment file in SAM/BAM format. It does not accept 'raw' unaligned reads.
To identify and remove duplicates in an aligned BAM file using Picard and SAMtools, use:
samtools sort MySample_Aligned.bam -o MySample_Aligned_Sorted.bam
java -jar MarkDuplicates.jar INPUT=MySample_Aligned_Sorted.bam OUTPUT=MySample_Aligned_Sorted_PCRDupes.bam ASSUME_SORTED=true METRICS_FILE=MySample_Aligned_Sorted_PCRDupes.txt
samtools index MySample_Aligned_Sorted_PCRDupes.bam
samtools view -b -F 0x400 MySample_Aligned_Sorted_PCRDupes.bam > MySample_Aligned_Sorted_PCRDupesRemoved.bam
samtools index MySample_Aligned_Sorted_PCRDupesRemoved.bam
For removing these from FASTQ files, see here: Removing PCR duplicates from .fastq without .bam alignment