Bwa Mem: How To Specify The Fastq Phred Format?
1
3
Entering edit mode
8.0 years ago

Hello,

I have a dataset of short reads in which some fastq files are in the Illumina 1.5 format, and others in the Illumina 1.8. My plan is to align these reads using bwa mem, and later do SNP calling on these.

The main difference between these two formats is that the phred scores are encoded in a different way (e.g. see http://en.wikipedia.org/wiki/FASTQ_format ). Thus, when I used bwa aln on the Illumina 1.5 format, I had to use the -I option to specify that the phred scores were encoded differently. I used to run something like:

bwa aln -I reference seq_illumina15.fastq.gz
bwa aln    reference seq_illumina18.fastq.gz


However, in bwa mem, there is no documentation about a -I option, or about how to specify which version of the fastq format is used (http://bio-bwa.sourceforge.net/bwa.shtml ). Thus, what is the correct way to specify how the phred scores are encoded, in bwa mem?

bwa fastq format • 6.3k views
5
Entering edit mode
8.0 years ago

See this thread on the mailing list. The short answer is that there is no option to tell bwa mem this, it assumes phred+33.

Edit: Just to add some more information and a reply from Heng Li, have a look through this thread as well (Heng Li's reply is the 5th one). Basically, Heng doesn't expect bwa mem to support phred+64 since the format isn't being used anymore. He happened to add a converter to seqtk, so that's one option (there are others out there).

0
Entering edit mode

thank you very much for the answer. So, I will have to convert the Illumina 1.5 files to 1.8 (or, explained in different words, phred+33 to phred+64), before running bwa mem.

0
Entering edit mode

I guess you mean phred+64 to phred+33? Because phred+33 is the new one (Illumina 1.8 and Sanger) - just to prevent confusion.

Traffic: 2717 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.