Question: 'QNAME' format different in 'bwa mem' output sam file and doesn't contain illumina sequence header information
gravatar for JoLY
7 months ago by
JoLY0 wrote:


Hoping someone can help me with this one as I'm failing to find a solution anywhere online as yet.

I generated sam files using 'bwa mem' as follows:

bwa mem -M -t 28 mm10bwaidx 1.fastq.gz 2.fastq.gz > output.sam

The data were PE 75bp reads, and as I had only one pair of fastq per sample I chose not to include any RG.

I expected the QNAME in the sam file to be the illumina FASTQ sequence header/ID, for example:


Rather, what I have is QNAMEs that look like this:


This seems to be causing me problems as far as detecting and marking optical duplicates using Picard is concerned.

Does anyone know why this is happening and how to redress the issue?

Best Wishes

sequencing alignment next-gen • 353 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by JoLY0

How and where did you download this data from? SRA or EBI? Using the -F option with fastq-dump would have given you the fastq headers in original Illumina format.

Note: ENA fastq version has these headers

@ERR174324.1 HSQ1009_86:1:1101:1192:2116/1

fastq-dump with -F produces

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax64k

Thank you very much for your response, the data were indeed downloaded from the EBI ENA.

ADD REPLYlink written 7 months ago by JoLY0
gravatar for JoLY
7 months ago by
JoLY0 wrote:

Thank you for your help genomax, as you pointed out in your comment, the EBI ENA FASTQ version header starts with an ENA specific ID. Being more familiar with sed than fastq-dump, I tried removing the ENA ID from the FASTQ as follows:

gzip -cd ENA_formatted.fastq.gz | sed '/^@/ s/.* /@/g' | gzip > new.fastq.gz

This enabled me to generate a valid BAM file using 'bwa mem' and Picard for which the QNAME is the Illumina FASTQ header and for which I could run Picard's 'MarkDuplicates' with optical duplicate detection and without any warnings or errors this time around.

ADD COMMENTlink written 7 months ago by JoLY0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 931 users visited in the last hour