Question: Bowtie Results Not Matching The Expected Error Qualities
0
gravatar for Xinwei Han
6.9 years ago by
Xinwei Han0
Xinwei Han0 wrote:

I stumbled upon this problem when playing with different flags of bowtie. I am not sure whether this is a bug or not. I extracted a read from my dataset (shown below) and put that in a file (test.fastq). It is in Sanger Fastq format.

@SRR218096.75 HWUSI-EAS465:3:1:3:839 length=36
ACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+SRR218096.75 HWUSI-EAS465:3:1:3:839 length=36
..1>@<.;>2>@@>2;@>;@@@@@@;@@@@;@BBBA

Then I used the following command to map this read to Arabidopsis Tair10 genome. I made the index by using:

bowtie-build tair10.fasta TAIR10

tair10.fasta is just a concatenated file of *.fas from ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/. Then map:

bowtie -a -m 25 -n 3 -e 60 --best --strata --sam TAIR10 test.fastq test

The alignment output is:

SRR218096.75 16 Chr3 2094491 255 36M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT ABBB@;@@@@;@@@@@@;>@;2>@@>2>;.<@>1.. XA:i:2 MD:Z:7C21T5G0 NM:i:3
SRR218096.75 16 Chr3 2094494 255 36M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT ABBB@;@@@@;@@@@@@;>@;2>@@>2>;.<@>1.. XA:i:2 MD:Z:4C21T8G0 NM:i:3
SRR218096.75 16 Chr3 2094504 255 36M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT ABBB@;@@@@;@@@@@@;>@;2>@@>2>;.<@>1.. XA:i:2 MD:Z:16T11A7 NM:i:2

However, by manually checking the sum of sequencing qualities in mismatched positions, I found the second alignment result with "4C21T8G0" actually has the sum exceeding 60, although I specified -e 60. ASCII code representing sequencing qualities in 3 mismatched positions are "@", "2" and "." . They correspond to 31, 17 and 13 in quality score. So the sum is 61 despite of -e 60. Please let me know if I made a mistake somewhere.

I am using bowtie 0.12.8.

bowtie • 2.1k views
ADD COMMENTlink modified 6.9 years ago by Istvan Albert ♦♦ 81k • written 6.9 years ago by Xinwei Han0

Just off the cuff, do you know specifically what quality scale was used during the generation of your data? (as there are a variety in use, e.g. http://en.wikipedia.org/wiki/FASTQ_format )

ADD REPLYlink written 6.9 years ago by seidel6.8k

It is Sanger Fastq. So it "encode a Phred quality score from 0 to 93 using ASCII 33 to 126".

ADD REPLYlink written 6.9 years ago by Xinwei Han0
1
gravatar for Istvan Albert
6.9 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

The manual says that it rounds to the nearest 10:

-e/--maqerr <int> Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.

so that means

31 + 17 + 13 = 61

30 + 20 + 10 = 60
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Istvan Albert ♦♦ 81k

Thank you so much, Istvan. I was in Penn State and attended your seminar several times. Thanks again.

ADD REPLYlink written 6.9 years ago by Xinwei Han0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 625 users visited in the last hour