Bowtie Inaccuracy Limit
1
0
Entering edit mode
11.8 years ago
Arpssss ▴ 40

From BowTie Paper I found that, it is able to find exact matches and also in exact matches. Now, from bowtie manual , I found how to build index for a genomic database. So, I build it using command,

bowtie-build hg19.fa hg19

Now, I want to run a query read file named "a493081_1.fastq" to find exact and inexact matches (allowing 1,2 and 3 substitutions - as specified in BowTie paper) for 150 bps read length.

So, I issue the command

./bowtie --all -v 0 hg19 a493081_1.fastq a.txt

to find all alignments with 0 mismatch. And BowTie outputs,

# reads processed: 200000
# reads with at least one reported alignment: 145692 (72.85%)
# reads that failed to align: 54308 (27.15%)
Reported 173932 alignments to 1 output stream(s)

However, all reads are taken from hg19, so BowTie should give output "NO reads that failed to align". BowTie provides inaccurate matching, but near about 30 % inaccuracy is not similar as I found from various comparison. Can anybody help me, in what reasons this inaccuracy can happen or any procedure to make it more accurate.

Additional: I should mention, my fastq file contains 150 bps single end reads.

bowtie bowtie2 genome • 3.5k views
ADD COMMENT
0
Entering edit mode

Can anybody help me by informing: Is it possible to reach error of 30 % for BowTie 1 ? Or I am making some mistakes ?

ADD REPLY
0
Entering edit mode

show us a read that did not align

ADD REPLY
0
Entering edit mode

Try using Bowtie2. In the documentation they say that bowtie 1 was developed having in mind short reads and bowtie2 should perform much better with larger read lengths. See: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1

ADD REPLY
2
Entering edit mode
11.8 years ago

Your data is most likely flawed in some manner. Given the heuristic nature of high throughput aligners we cannot expect to be able to map back all reads even if these were simulated from the target reference genome. On the other hand an error rate of 30% would be excessive and frankly it would make the tool unusable for most purposes. So that alone indicates that your are misusing either the data or the aligner.

At the same time note that there is more to accuracy than simply accepting a reported match. One should also verify that the match is indeed a true positive. For a more thorough comparison of the accuracy of several mappers see Heng Li's ROC curves at:

http://lh3lh3.users.sourceforge.net/alnROC.shtml

ADD COMMENT

Login before adding your answer.

Traffic: 2389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6