Question: What Is The Default Quality Encoding Expected By Bwa?
gravatar for Panos
7.6 years ago by
Geneva, Switzerland
Panos1.7k wrote:

What is the quality encoding in the input reads that BWA expects as default? Is it Sanger, Solexa, Illumina 1.3+, Illumina 1.5+ or Illumina 1.8+ (as per the section "Encoding" found in this Wikipedia article). Also, is it true that BWA doesn't really use the quality values for finding matches? What is the usefullness then, of the "-I" parameter in bwa aln? How are the quality values used by BWA?

What if I have reads generated by the new Illumina 1.8 pipeline? Should I somehow convert qualities before feeding them to BWA? I'm asking because I saw that quality range in 1.8 differs significantly compared to both 1.3 and 1.5.

illumina bwa • 4.0k views
ADD COMMENTlink modified 6.0 years ago by Giovanni M Dall'Olio26k • written 7.6 years ago by Panos1.7k
gravatar for Istvan Albert
7.6 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Every tool has standardized on the Sanger encoding.

That being said the quality scores are extremely rough estimates that do not really reflect the actual probabilities that they supposedly stand for. In that light whether or not they are off a bit does not really matter. As you note most tools do not make use of the quality scores during alignments, thankfully so since that might lead to a lot of confusion and would interfere with interpreting the alignments.

The only potential problem that you might run into is that some tools cannot deal with the variable ranges.

ADD COMMENTlink modified 6.0 years ago • written 7.6 years ago by Istvan Albert ♦♦ 81k
gravatar for Giovanni M Dall'Olio
6.0 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

I am also having the same problem with all the messy Illumina formats. In summary, I think that:

  • bwa by default expects the sanger format
  • the -I option is needed to read the Illumina 1.3 to 1.6 formats.
  • the Illumina 1.8 format is similar to the sanger, so you don't need the -I option for that.

I've updated the Fastq wikipedia page with some sed scripts to convert Illumina 1.8 to 1.3 and vice-versa, but in principle you don't need to use them.

What happens if you run bwa aln on a Illumina 1.8 dataset, using the -I option? Unfortunately I don't know yet, but I think you will need to run the bwa aln again.

ADD COMMENTlink written 6.0 years ago by Giovanni M Dall'Olio26k

For what happens if you incorrectly set -I, see Seeing unexpected characters (^D,^Q) in the QUAL field of a SAM file

ADD REPLYlink written 6.0 years ago by Devon Ryan93k

hehe, nice pointer, we have an answer for everything

ADD REPLYlink written 6.0 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1792 users visited in the last hour