We are having an unusual problem and we haven't found a solution, yet. We are working with paired-end RNA-Seq reads and the Mus musculus reference genome sequence and annotation from ENSEMBL. When we align the RNA-Seq reads to the reference genome sequence using hisat2, version 2.1.0, we get ~94% mapping efficiency (ME) (ie, we can map ~94% of the input RNA-Seq reads to the reference sequence). When we use the --tmo option (which only maps read to exons) and hisat2 version 2.0.4 the ME falls to about 62%, which is OK for our purposes, although still lower than with comparable settings using TopHat2. When we use the --tmo option and hisat2 version 2.1.0, the ME falls to ~14%, which is too low for our purposes. We have used the pre-built genome index files from the HISAT2 web site and hand-built index files using hisat2-build and the ss and exon files generated from a GTF file based on the ENSEMBL gff file using the scripts included with HISAT2. We have used the pre-built HISAT2 binaries from the HISAT2 web site and hand-compiled binaries. In all cases the results are similar; using the --tmo option with hisat2 version 2.1.0 causes the ME to fall to unacceptably low levels, whereas using the --tmo option with hisat2 version 2.0.4 is fine.
Has anybody else observed this behavior? We are stumped.