Question: Low read mapping efficiency with HISAT2
0
gravatar for givans
22 months ago by
givans0
givans0 wrote:

Hi,

We are having an unusual problem and we haven't found a solution, yet. We are working with paired-end RNA-Seq reads and the Mus musculus reference genome sequence and annotation from ENSEMBL. When we align the RNA-Seq reads to the reference genome sequence using hisat2, version 2.1.0, we get ~94% mapping efficiency (ME) (ie, we can map ~94% of the input RNA-Seq reads to the reference sequence). When we use the --tmo option (which only maps read to exons) and hisat2 version 2.0.4 the ME falls to about 62%, which is OK for our purposes, although still lower than with comparable settings using TopHat2. When we use the --tmo option and hisat2 version 2.1.0, the ME falls to ~14%, which is too low for our purposes. We have used the pre-built genome index files from the HISAT2 web site and hand-built index files using hisat2-build and the ss and exon files generated from a GTF file based on the ENSEMBL gff file using the scripts included with HISAT2. We have used the pre-built HISAT2 binaries from the HISAT2 web site and hand-compiled binaries. In all cases the results are similar; using the --tmo option with hisat2 version 2.1.0 causes the ME to fall to unacceptably low levels, whereas using the --tmo option with hisat2 version 2.0.4 is fine.

Has anybody else observed this behavior? We are stumped.

Thanks.

hisat2 rna-seq • 944 views
ADD COMMENTlink modified 22 months ago by h.mon29k • written 22 months ago by givans0
1

I see you already opened an issue ticket at HISAT2 github page - would be my first suggestion.

Honestly, it seems a bug on hisat-2.1.0 - do you really need the --tmo flag?

If you quantify transcripts / genes (with featureCounts, for example) from the hisat-2.1.0 without --tmo alignment, what is the assignment rate?

Are you using other parameters that could influence mapping rate?

ADD REPLYlink modified 22 months ago • written 22 months ago by h.mon29k

Yeah, I've received no responses on github. Mapping to the known exons is part of our standard procedure for these types of projects. However, you're right, we could try something like featureCounts to get a similar effect.

ADD REPLYlink written 22 months ago by givans0
1

Is there a reason why you are forcing mapping only to exons? Perhaps time to try a different aligner and see what you get. STAR and bbmap can both be good alternatives.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax77k

Although there are different ways to accomplish the same goal, we typically map to known transcripts in annotated reference genomes to simplify the downstream interpretations.

ADD REPLYlink written 22 months ago by givans0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1740 users visited in the last hour