Question: Bowtie Inaccuracy Limit
gravatar for Arpssss
6.8 years ago by
Arpssss40 wrote:

From BowTie Paper I found that, it is able to find exact matches and also in exact matches. Now, from bowtie manual , I found how to build index for a genomic database. So, I build it using command,

bowtie-build hg19.fa hg19

Now, I want to run a query read file named "a493081_1.fastq" to find exact and inexact matches (allowing 1,2 and 3 substitutions - as specified in BowTie paper) for 150 bps read length.

So, I issue the command

./bowtie --all -v 0 hg19 a493081_1.fastq a.txt

to find all alignments with 0 mismatch. And BowTie outputs,

# reads processed: 200000
# reads with at least one reported alignment: 145692 (72.85%)
# reads that failed to align: 54308 (27.15%)
Reported 173932 alignments to 1 output stream(s)

However, all reads are taken from hg19, so BowTie should give output "NO reads that failed to align". BowTie provides inaccurate matching, but near about 30 % inaccuracy is not similar as I found from various comparison. Can anybody help me, in what reasons this inaccuracy can happen or any procedure to make it more accurate.

Additional: I should mention, my fastq file contains 150 bps single end reads.

genome bowtie2 bowtie • 2.4k views
ADD COMMENTlink written 6.8 years ago by Arpssss40

Can anybody help me by informing: Is it possible to reach error of 30 % for BowTie 1 ? Or I am making some mistakes ?

ADD REPLYlink written 6.8 years ago by Arpssss40

show us a read that did not align

ADD REPLYlink written 6.8 years ago by Jeremy Leipzig18k

Try using Bowtie2. In the documentation they say that bowtie 1 was developed having in mind short reads and bowtie2 should perform much better with larger read lengths. See:

ADD REPLYlink written 6.8 years ago by Fidel1.9k
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

Your data is most likely flawed in some manner. Given the heuristic nature of high throughput aligners we cannot expect to be able to map back all reads even if these were simulated from the target reference genome. On the other hand an error rate of 30% would be excessive and frankly it would make the tool unusable for most purposes. So that alone indicates that your are misusing either the data or the aligner.

At the same time note that there is more to accuracy than simply accepting a reported match. One should also verify that the match is indeed a true positive. For a more thorough comparison of the accuracy of several mappers see Heng Li's ROC curves at:

ADD COMMENTlink written 6.8 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1129 users visited in the last hour