I've recently switched from using TopHat to using HISAT. Trying to figure out which parameters do what I want has been a bit of a challenge in spite of the manual. I realize some things may not be explainable as they are proprietary. But I think a few of my questions can be addressed.
- Is there a "Mean Inner Distance between Mate Pairs" (TopHat) equivalent in HISAT?
- Is there a way to only display/align reads that have no mismatches? (I think I figured this one out, but see the next question)
- How does HISAT calculate the Alignment Score (AS)? From what I can tell with my data, if the read is a perfect match the AS = 0, but if there is a mismatch/insertion/deletion/soft-clipping/etc. it is around AS=250.
- How does the program decide if it is using HGM or HGFM?
Thanks for any help that can be provided!
Hello! I am trying to figure out how to only allow a specific number of mismatches (eg 2) using HISAT2, which should be in your question 2. Could you please let me know how you achieve that? Thank you!
Old post, but will include for anyone else looking for similar answer.
HISAT2 has a scoring function that it uses to calculate the final alignment. You can set the mismatch penalty to be a constant penalty that will be some multiple of the function that you change.
So if you want two mismatches you can change the minimum scoring function to something like
Changing the function to no longer be dependent upon the read length (the second integer term).
Then, changing mismatch penalty, we can have the min/max be the same so a constant term is subtracted whenever a mismatch is detected.
The other options are to prevent gaps and ambigious characters from being reported in abundance. You're probably better off using something like bowtie2 if you want to specify an exact number of mismatches.