Question: 'QNAME' format different in 'bwa mem' output sam file and doesn't contain illumina sequence header information
0
gravatar for JoLY
16 months ago by
JoLY0
JoLY0 wrote:

Hello,

Hoping someone can help me with this one as I'm failing to find a solution anywhere online as yet.

I generated sam files using 'bwa mem' as follows:

bwa mem -M -t 28 mm10bwaidx 1.fastq.gz 2.fastq.gz > output.sam

The data were PE 75bp reads, and as I had only one pair of fastq per sample I chose not to include any RG.

I expected the QNAME in the sam file to be the illumina FASTQ sequence header/ID, for example:

K00103:94:H73C2BBXX:7:1103:14194:9737

Rather, what I have is QNAMEs that look like this:

ERR174324.81165065

This seems to be causing me problems as far as detecting and marking optical duplicates using Picard is concerned.

Does anyone know why this is happening and how to redress the issue?

Best Wishes

sequencing alignment next-gen • 618 views
ADD COMMENTlink modified 15 months ago • written 16 months ago by JoLY0
1

How and where did you download this data from? SRA or EBI? Using the -F option with fastq-dump would have given you the fastq headers in original Illumina format.

Note: ENA fastq version has these headers

@ERR174324.1 HSQ1009_86:1:1101:1192:2116/1

fastq-dump with -F produces

@HSQ1009_86:1:1101:1192:2116
ADD REPLYlink modified 16 months ago • written 16 months ago by genomax75k

Thank you very much for your response, the data were indeed downloaded from the EBI ENA.

ADD REPLYlink written 15 months ago by JoLY0
0
gravatar for JoLY
15 months ago by
JoLY0
JoLY0 wrote:

Thank you for your help genomax, as you pointed out in your comment, the EBI ENA FASTQ version header starts with an ENA specific ID. Being more familiar with sed than fastq-dump, I tried removing the ENA ID from the FASTQ as follows:

gzip -cd ENA_formatted.fastq.gz | sed '/^@/ s/.* /@/g' | gzip > new.fastq.gz

This enabled me to generate a valid BAM file using 'bwa mem' and Picard for which the QNAME is the Illumina FASTQ header and for which I could run Picard's 'MarkDuplicates' with optical duplicate detection and without any warnings or errors this time around.

ADD COMMENTlink written 15 months ago by JoLY0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1106 users visited in the last hour