I have been working on human RNA seq data. Most of the times i am interested in read counts of the genes. I use Tophat for aligning my RNA-seq data. What i want to know is can bowtie (bowtie/bowtie2) alone be used for RNA-seq data??. If yes then i used bowtie on my RNA SEQ data and when i tried to look for spliced alignments across the reads i could not find any cigar string with N in it. So this means when i am calculating my read counts for a particular multi exonic gene i won't be having the reads which are across the intron because bowtie is not reporting them. Does any one else also see that bowtie itself does not gives splice junction results i.e it is not splice aware.
If this is so, i have read some posts of people using bowtie for RNA seq data.
TopHat is the splice junction mapper that sits on top of bowtie/bowtie2. So no, bowtie1 and bowtie2 will not map across those exon/exon boundaries. That's exactly what TopHat is for.
So Daniel this would mean i would loose reads across the junctions if i map my reads with bowtie1/bowtie2 because it won't get reported in the sam/bam file as cigar string with N's are not present when we use bowtie1/bowtie2
Yes , you will loose most of the reads that span across exons. In some cases where most part of the read comes from one exon and only little comes from another exon Bowtie will report it using soft clipping but the reads where half of it comes from one exon and half from the other I dont think bowties will report it usign soft clipping.
Bowtie1/2 is not splice aware, but you can still use them for RNA-seq. For example, you can incorporate known transcripts to the reference genome such that a read can be aligned in full length. For 100 long reads, you can split them into 35+30+35 segments and align the two 35bp segments to test whether there is an splice junction in the middle 30bp. With bowtie2, you can use --local and -k to get multiple local hits caused by splicing. These approaches sound crude (indeed they are), but because the ascertainment is simple and transparent, you will know where are the pitfalls and how to properly interpret your results. In some corner cases, it may be preferred to use bowtie1/2 for RNA-seq data instead of blindly relying on a splice-aware mapper with many nifty-hefty but important details opaque to you.
So Daniel this would mean i would loose reads across the junctions if i map my reads with bowtie1/bowtie2 because it won't get reported in the sam/bam file as cigar string with N's are not present when we use bowtie1/bowtie2
Yes , you will loose most of the reads that span across exons. In some cases where most part of the read comes from one exon and only little comes from another exon Bowtie will report it using soft clipping but the reads where half of it comes from one exon and half from the other I dont think bowties will report it usign soft clipping.