Does HISAT2 have some criteria that causes it to filter splice sites from the final output?
I'm using HISAT2 (version 2.0.5) to try and find novel splice sites. Going through the SAM file produced by HISAT2 suggests that there are a number of alignments that support the existence of splice junctions that are not reported by the program (i.e. they aren't in the file you get using the --novel-splicesite-outfile option).
For example, I've got a read aligned like this:
SRR360120.14138165 83 V 12814114 60 9M1297N91M = 12813953 -261 ATCCCATGTCTTAATTAAACTTGTGGTAACTTTTAATGAATTAAACTTCTGATTTTGCCGATAAGCATATCATATGAAAAATACTAAAAATGTCGAAATG CC?CCCCDECEEEEC?D@FFDEECHHGEEGDEFHAGGIGJIIIHHGIJIGHCGDIJJGIIGHHCG@E?A9EAGIEGF@HHBFAJIGJHFF<FADBFF@@@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YS:i:251 YT:Z:CP XS:A:- NH:i:1
The way it's split suggests that there should be a splice junction reported at V:12,814,123-12,815,419, but there's no such entry in the splice sites file.
For comparison, I aligned these reads using TopHat (version 2.1.0) and it produced the exact same alignment for this read. However, TopHat DOES report the expected splice junction at V:12,814,123-12,815,419.