Question

Does Hisat2 automatically align strandedness

1

Entering edit mode

2.6 years ago

OhHiImNewHere ▴ 10

I've been using Hisat2 to align some RNA. RNA was prepped using NEB Directional library kit.

When I originally aligned, I did not use the --rna-strandedness option.

I then realised I do want stranded, as part of my reference (which is human genome plus integrated virus) produces proteins on the minus strand only. I was going to realign with the --rna-strandedness option set to RF (which would be correct for that- and most other- Illumina library prep kits) and reads would then get a XS tag.

I noticed looking at the bam file that the reads already have tags however, and the tags seem to align to the strand eg the section that should only produce reads off the minus strand only have tags that are only XS:A:- and there are no XS:A:+. (The other sections which should produce RNA off both strands are a mixture of XS:A:+ and XS:A:-)

I realised the default for single-end reads is unstranded, but when doing paired ends, does Hisat2 automatically detect strandedness and default to RF? I can't see anything about this in the manual but wondered if there had been an update I don't know about?

hisat2 alignment strandedness • 1.3k views

ADD COMMENT • link updated 17 months ago by Istvan Albert 100k • written 2.6 years ago by OhHiImNewHere ▴ 10

0

Entering edit mode

I have noticed the same. Have you got the answer for this?

ADD REPLY • link 17 months ago by anna • 0

score 1 · Answer 1 · 2022-11-17

This is an interesting question because once we think about in more detail, we find various contradictions at play.

First of all, the presence of these tags is somewhat superfluous - perhaps even misleading. From the start, we already know which way the transcript faces from the alignment alone.

For the standard Illumina prep kit (RF), a read from file 1 aligning on the forward strand will indicate that the original transcript was on the reverse strand. So why would this XS tag even be needed?

Notably, even the HiSat2 manual seems to get confused because in different sections it states:

With this option being used, every read alignment will have an XS attribute tag:

’+’ means a read belongs to a transcript on ‘+’ strand of genome.

‘-‘ means a read belongs to a transcript on ‘-‘ strand of genome.

but then later states:

XS:A:<A> : Values of + and - indicate the read is mapped to transcripts on sense and anti-sense strands, respectively.

Oy! A major pet peeve of mine! Forward and reverse are not equivalent to sense and antisense.

Antisense is not a strand at all. It is a direction relative to a strand.

There is actually such a thing as antisense transcription, in which case the transcript would come from the opposite strand. In that case, the XS tag would be incorrect since it would indicate the wrong strand.

Long story short, I would say that this XS is superfluous and probably does not even do what it is supposed to. So I would not worry about it.

The best way to figure out all this is by using both your BAM file and an annotation file that lists the strand of the expected transcripts. The bedtools intersect tool has a stranded -s and an opposite strand mode -S. With that, you can separate the alignments into those that:

match the transcripts in the correct orientation (on each strand)
that match the transcripts in antisense (again on both strands separately)