Hoping someone can help me with this one as I'm failing to find a solution anywhere online as yet.
I generated sam files using 'bwa mem' as follows:
bwa mem -M -t 28 mm10bwaidx 1.fastq.gz 2.fastq.gz > output.sam
The data were PE 75bp reads, and as I had only one pair of fastq per sample I chose not to include any RG.
I expected the QNAME in the sam file to be the illumina FASTQ sequence header/ID, for example:
Rather, what I have is QNAMEs that look like this:
This seems to be causing me problems as far as detecting and marking optical duplicates using Picard is concerned.
Does anyone know why this is happening and how to redress the issue?