My understanding to alignment tools are quite limited. Any suggestions are quite welcome. There are quite a lot of articles and questions already on this, but I am not able to find an answer in this context.
I am trying to understand what difference does it make to align the reads by using bowtie and tophat and cufflinks.
My data: Illumina short reads, single end reads . I am not interested in isoforms.
Currently I am using bowtie and aligning with 2/3 mismatches. (I am trimming away the barcodes, adapter sequences, also I am trimming the far end of sequences... At the end I have reads of length either 35 to 40). On an average I have about 78 - 80% of reads aligning to genome.
Do you see any limitations in directly using Bowtie over the other tools.
Bowtie is used for genomic reads alignment against the reference genome.
Tophat is used for transcriptomic reads alignment against the reference genome and it uses bowtie in the first phase of alignment. But in addition to bowtie, tophat can also align reads that span exon-exon junctions. so you will get more read aligned against the reference genome if you use tophat for RNAseq data.
Cufflinks is not an alignment tool. It is primarily used to calculate transcript abundance and abundance of different isoforms of the same gene.
Check the pages for bowtie, tophat and cufflink for more information.
If you use directly bowtie to align RNA-seq data to reference genome you won't map any read to splice junctions of the mRNAs. Tophat cat do the gaped alignments required to map reads to the splice junctions, but it's much more slower than bowtie.
As your reads are shorts, you probably get similar results with both programs due to the low probability that a short read has to map to a splice junction. So it's a good "quick n' dirty" approximation to start.