I've created a small reference fasta file that consists of combined sequences of two exons from two different genes where I know the fusion has occurred. I've uploaded that reference file into Galaxy as well as a fastq file and ran the tool with all default setting. The alignment worked perfectly.
I'm trying to reproduce the same on the command line. I've downloaded and installed HISAT2 version 2.0.5. I'm trying to reproduce the same alignment that was achieved using Galaxy.
Here are the steps that I followed:
Indexed the reference with HISAT2
hisat2-build /data/HISAT2/BAG4_ref.fasta BAG4_ref_indexed
Performed the alinement
hisat2 -x /data/HISAT2/index/BAG4_ref_indexed -U /data/HISAT2/IonXpress_011.fq -S /data/HISAT2/Bag4.sam samtools view -bS Bag4.sam > Bag4.bam
Here are the stats:
345063 reads; of these: 345063 (100.00%) were unpaired; of these: 344481 (99.83%) aligned 0 times 578 (0.17%) aligned exactly 1 time 4 (0.00%) aligned >1 times 0.17% overall alignment rate
This command-line method produces a much smaller bam file 5,091K vs 19,349 K (Galaxy).
What am I doing wrong. Please help to diagnose the problem.