From BowTie Paper I found that, it is able to find exact matches and also in exact matches. Now, from bowtie manual , I found how to build index for a genomic database. So, I build it using command,
bowtie-build hg19.fa hg19
Now, I want to run a query read file named "a493081_1.fastq" to find exact and inexact matches (allowing 1,2 and 3 substitutions - as specified in BowTie paper) for 150 bps read length.
So, I issue the command
./bowtie --all -v 0 hg19 a493081_1.fastq a.txt
to find all alignments with 0 mismatch. And BowTie outputs,
# reads processed: 200000
# reads with at least one reported alignment: 145692 (72.85%)
# reads that failed to align: 54308 (27.15%)
Reported 173932 alignments to 1 output stream(s)
However, all reads are taken from hg19, so BowTie should give output "NO reads that failed to align". BowTie provides inaccurate matching, but near about 30 % inaccuracy is not similar as I found from various comparison. Can anybody help me, in what reasons this inaccuracy can happen or any procedure to make it more accurate.
Additional: I should mention, my fastq file contains 150 bps single end reads.
Can anybody help me by informing: Is it possible to reach error of 30 % for BowTie 1 ? Or I am making some mistakes ?
show us a read that did not align
Try using Bowtie2. In the documentation they say that bowtie 1 was developed having in mind short reads and bowtie2 should perform much better with larger read lengths. See: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1