How does HISAT calculate the alignment score? (and other TopHat to HISAT questions)
1
0
Entering edit mode
6.5 years ago
aih5 • 0

I've recently switched from using TopHat to using HISAT. Trying to figure out which parameters do what I want has been a bit of a challenge in spite of the manual. I realize some things may not be explainable as they are proprietary. But I think a few of my questions can be addressed.

  1. Is there a "Mean Inner Distance between Mate Pairs" (TopHat) equivalent in HISAT?
  2. Is there a way to only display/align reads that have no mismatches? (I think I figured this one out, but see the next question)
  3. How does HISAT calculate the Alignment Score (AS)? From what I can tell with my data, if the read is a perfect match the AS = 0, but if there is a mismatch/insertion/deletion/soft-clipping/etc. it is around AS=250.
  4. How does the program decide if it is using HGM or HGFM?

Thanks for any help that can be provided!

RNA-Seq alignment • 3.6k views
ADD COMMENT
0
Entering edit mode

Hello! I am trying to figure out how to only allow a specific number of mismatches (eg 2) using HISAT2, which should be in your question 2. Could you please let me know how you achieve that? Thank you!

ADD REPLY
0
Entering edit mode

Old post, but will include for anyone else looking for similar answer.

HISAT2 has a scoring function that it uses to calculate the final alignment. You can set the mismatch penalty to be a constant penalty that will be some multiple of the function that you change.

So if you want two mismatches you can change the minimum scoring function to something like

--score-min L,-21,0

Changing the function to no longer be dependent upon the read length (the second integer term).

Then, changing mismatch penalty, we can have the min/max be the same so a constant term is subtracted whenever a mismatch is detected.

--mp 10,10 --rdg 500,500 -np 500 --rfg 500,5000

The other options are to prevent gaps and ambigious characters from being reported in abundance. You're probably better off using something like bowtie2 if you want to specify an exact number of mismatches.

ADD REPLY
3
Entering edit mode
6.5 years ago
  1. No, thankfully.
  2. I guess you could set --mp to something quite high, though why you would want to forbid mismatches is beyond me.
  3. An alignment starts with a score of 0 and gets penalized according to how it aligns and the settings for --mp, --sp and so on.
  4. I have no idea what you mean by "HGM". Hisat2 uses an HGFM index which may or may not include things like SNPs or splice sites. Whether it does or not depends on how you made the indices. See the help for hisat2-build.

BTW, since you mentioned "proprietary", please be aware that it's rare for anything in bioinformatics to involve proprietary code. For example, hisat2's entire source code is available here.

ADD COMMENT
0
Entering edit mode

In response to question 1, what do the parameters -I and -X (min and max fragment length) have to do with the paired end parameters. I guess I don't completely understand the correlation between these and the other parameters such as disabling looking for discordant mates, etc.

ADD REPLY
0
Entering edit mode

Fragment lengths outside of those ranges will be discordant.

ADD REPLY
0
Entering edit mode

I guess you could set --mp to something quite high, though why you would want to forbid mismatches is beyond me.

please explain a bit more, i also need a specific number of mis-matches. alignments with one mismatch and with two mismatches allowed..

ADD REPLY

Login before adding your answer.

Traffic: 2071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6