Hi,
We are having an unusual problem and we haven't found a solution, yet. We are working with paired-end RNA-Seq reads and the Mus musculus reference genome sequence and annotation from ENSEMBL. When we align the RNA-Seq reads to the reference genome sequence using hisat2, version 2.1.0, we get ~94% mapping efficiency (ME) (ie, we can map ~94% of the input RNA-Seq reads to the reference sequence). When we use the --tmo option (which only maps read to exons) and hisat2 version 2.0.4 the ME falls to about 62%, which is OK for our purposes, although still lower than with comparable settings using TopHat2. When we use the --tmo option and hisat2 version 2.1.0, the ME falls to ~14%, which is too low for our purposes. We have used the pre-built genome index files from the HISAT2 web site and hand-built index files using hisat2-build and the ss and exon files generated from a GTF file based on the ENSEMBL gff file using the scripts included with HISAT2. We have used the pre-built HISAT2 binaries from the HISAT2 web site and hand-compiled binaries. In all cases the results are similar; using the --tmo option with hisat2 version 2.1.0 causes the ME to fall to unacceptably low levels, whereas using the --tmo option with hisat2 version 2.0.4 is fine.
Has anybody else observed this behavior? We are stumped.
Thanks.
I see you already opened an issue ticket at HISAT2 github page - would be my first suggestion.
Honestly, it seems a bug on hisat-2.1.0 - do you really need the
--tmo
flag?If you quantify transcripts / genes (with featureCounts, for example) from the hisat-2.1.0 without
--tmo
alignment, what is the assignment rate?Are you using other parameters that could influence mapping rate?
Yeah, I've received no responses on github. Mapping to the known exons is part of our standard procedure for these types of projects. However, you're right, we could try something like featureCounts to get a similar effect.
Is there a reason why you are forcing mapping only to exons? Perhaps time to try a different aligner and see what you get. STAR and bbmap can both be good alternatives.
Although there are different ways to accomplish the same goal, we typically map to known transcripts in annotated reference genomes to simplify the downstream interpretations.