I ran HISAT2 (index built using a transcriptome multi fasta) intending that it won't perform gapped alignment. I use following script to run HISAT:
INDEX=./indices/hisat/transcriptome
FASTQ=$1
OUTPUT=./transcriptome_aligned/$2.sam
./software/hisat-0.1.6-beta/hisat \
-q \
-p 2 \
--no-spliced-alignment \
--end-to-end \
-x $INDEX \
-U $FASTQ \
-S $OUTPUT
Should I still expect gapped alignment in my SAM file? I have records like this in the SAM output.
SRR2144041.255 0 YCL025C 274 255 16M1I33M * 0 0 CAGGCTCAAGAACTAGAAAAAAAATGAAAGTTCGGACAACATAGGCGCTA CCCFFFFFHHHHHJJJJJJJJJJJJJJJIJGIIIIJJJJJIJJIIJJJHH AS:i:-8 XN:i:0 XM:i:0 XO:i:1 XG:i:1 NM:i:1 MD:Z:49 YT:Z:UU NH:i:1
This shows that HISAT2 is still performing gapped alignment even with --end-to-end
and --no-splice-alignment
parameters.
I'm trying to use the output SAM for rsem-calculate-expression
but it returns following error due to presence of gapped alignment:
rsem-parse-alignments ./indices/rsem/rsem ./rsem_output/sample.temp/sample ./rsem_output/sample.stat/sample ./transcriptome_aligned/sample.bam 1 -tag XM
Read SRR2144041.836747: RSEM currently does not support gapped alignments, sorry!
"rsem-parse-alignments ./indices/rsem/rsem ./rsem_output/sample.temp/sample ./rsem_output/sample.stat/sample ./transcriptome_aligned/sample.bam 1 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!
How do I make sure that HISAT2 doesn't perform gapped alignment? Should I filter the output for using grep -v XO:i:0
?
EDIT:
I checked RSEM manual and found that in order to avoid gapped alignments using Bowtie2, RSEM uses following Bowtie2 parameters:
--sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1
I wonder what is the equivalent of --gbar
in HISAT2
Thanks