I have come across very peculiar alignment results for plant RNASeq data. I have aligned RNASeq data 150 *2 illumina stranded library [think matters only during expression quantification]. Aligned to its reference genome (https://www.ncbi.nlm.nih.gov/assembly/GCA_000826755.1) using bowtie2, tophat2, hisat2 and STAR (all run with default parameters. Same parameters of bowtie2 were used in tophat and hisat). It was interesting to see bowtie out performs splice aware aligner (expect STAR which wins marginally). Here are the stats (%),
Bowtie2- 53.57 tophat2 - 12.5 hisat2 - 39.24 STAR - 58.64
Similar post were there 1, 2 and could not find a conclusive result. From unaligned reads, 5 random set of 100 reads were checked, 60% of the sequences found match in blast result of which 72.09 are still matching to predicted mRNAs of the reference genome.
My questions, 1) Star being local aligner, how come these reads are missed out? (is it because long exon-exon junction gap)
2) How bowtie is performing better than splice aware programs?