Question: 'QNAME' format different in 'bwa mem' output sam file and doesn't contain illumina sequence header information
gravatar for JoLY
22 months ago by
JoLY10 wrote:


Hoping someone can help me with this one as I'm failing to find a solution anywhere online as yet.

I generated sam files using 'bwa mem' as follows:

bwa mem -M -t 28 mm10bwaidx 1.fastq.gz 2.fastq.gz > output.sam

The data were PE 75bp reads, and as I had only one pair of fastq per sample I chose not to include any RG.

I expected the QNAME in the sam file to be the illumina FASTQ sequence header/ID, for example:


Rather, what I have is QNAMEs that look like this:


This seems to be causing me problems as far as detecting and marking optical duplicates using Picard is concerned.

Does anyone know why this is happening and how to redress the issue?

Best Wishes

sequencing alignment next-gen • 843 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by JoLY10

How and where did you download this data from? SRA or EBI? Using the -F option with fastq-dump would have given you the fastq headers in original Illumina format.

Note: ENA fastq version has these headers

@ERR174324.1 HSQ1009_86:1:1101:1192:2116/1

fastq-dump with -F produces

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax85k

Thank you very much for your response, the data were indeed downloaded from the EBI ENA.

ADD REPLYlink written 22 months ago by JoLY10
gravatar for JoLY
22 months ago by
JoLY10 wrote:

Thank you for your help genomax, as you pointed out in your comment, the EBI ENA FASTQ version header starts with an ENA specific ID. Being more familiar with sed than fastq-dump, I tried removing the ENA ID from the FASTQ as follows:

gzip -cd ENA_formatted.fastq.gz | sed '/^@/ s/.* /@/g' | gzip > new.fastq.gz

This enabled me to generate a valid BAM file using 'bwa mem' and Picard for which the QNAME is the Illumina FASTQ header and for which I could run Picard's 'MarkDuplicates' with optical duplicate detection and without any warnings or errors this time around.

ADD COMMENTlink written 22 months ago by JoLY10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1503 users visited in the last hour