Question

Low read mapping efficiency with HISAT2

0

Entering edit mode

6.1 years ago

givans • 0

Hi,

We are having an unusual problem and we haven't found a solution, yet. We are working with paired-end RNA-Seq reads and the Mus musculus reference genome sequence and annotation from ENSEMBL. When we align the RNA-Seq reads to the reference genome sequence using hisat2, version 2.1.0, we get ~94% mapping efficiency (ME) (ie, we can map ~94% of the input RNA-Seq reads to the reference sequence). When we use the --tmo option (which only maps read to exons) and hisat2 version 2.0.4 the ME falls to about 62%, which is OK for our purposes, although still lower than with comparable settings using TopHat2. When we use the --tmo option and hisat2 version 2.1.0, the ME falls to ~14%, which is too low for our purposes. We have used the pre-built genome index files from the HISAT2 web site and hand-built index files using hisat2-build and the ss and exon files generated from a GTF file based on the ENSEMBL gff file using the scripts included with HISAT2. We have used the pre-built HISAT2 binaries from the HISAT2 web site and hand-compiled binaries. In all cases the results are similar; using the --tmo option with hisat2 version 2.1.0 causes the ME to fall to unacceptably low levels, whereas using the --tmo option with hisat2 version 2.0.4 is fine.

Has anybody else observed this behavior? We are stumped.

Thanks.

RNA-Seq hisat2 • 2.1k views

ADD COMMENT • link updated 6.1 years ago by h.mon 35k • written 6.1 years ago by givans • 0

1

Entering edit mode

I see you already opened an issue ticket at HISAT2 github page - would be my first suggestion.

Honestly, it seems a bug on hisat-2.1.0 - do you really need the --tmo flag?

If you quantify transcripts / genes (with featureCounts, for example) from the hisat-2.1.0 without --tmo alignment, what is the assignment rate?

Are you using other parameters that could influence mapping rate?

ADD REPLY • link 6.1 years ago by h.mon 35k

0

Entering edit mode

Yeah, I've received no responses on github. Mapping to the known exons is part of our standard procedure for these types of projects. However, you're right, we could try something like featureCounts to get a similar effect.

ADD REPLY • link 6.1 years ago by givans • 0

1

Entering edit mode

Is there a reason why you are forcing mapping only to exons? Perhaps time to try a different aligner and see what you get. STAR and bbmap can both be good alternatives.

ADD REPLY • link 6.1 years ago by GenoMax 141k

0

Entering edit mode

Although there are different ways to accomplish the same goal, we typically map to known transcripts in annotated reference genomes to simplify the downstream interpretations.

ADD REPLY • link 6.1 years ago by givans • 0