Hi all,
I have some RNA-seq reads from a grasshopper species. I mapped them to a transcriptome reference (from the same species) using Bowtie2.
When I ran Bowtie2 with its default settings (allowing gaps), I observed an overall mapping rate of >95%.
However, when I disabled gap alignment (i.e., using the settings RSEM uses by default when calling Bowtie2: --bowtie2 --sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 --no-mixed --no-discordant
), the mapping rate dropped drastically to ~18%.
Some context:
The transcriptome reference was assembled from whole-body RNA of individuals collected at a different location.
My reads come from a specific tissue.
I expected some reduction when no gap is allowed, but does a drop from >95% to ~18% seem too extreme? Is this expected when disallowing gaps, or could it suggest other issues?
Any insights or suggestions would be greatly appreciated!
Thanks!
What is the reason for not allowing gaps? Generally one would not want gaps when aligning microRNA or small RNA data but not regular RNA.
RSEM does not permit processing alignments with gaps. This is due to the way that the program models read error (using a fixed, position-specific scoring model for the mismatch probability). As OP suggests, this drastic change in mapping rate may be a sign of a deeper issue. On the other hand, I'd also probably recommend against forcing gapless alignments (e.g. salmon can process RNA-seq alignments that include gaps).