How to achieve the same alignment with HiSAT2 command line vs. HiSAT2 Galaxy
0
0
Entering edit mode
5.1 years ago
genya35 ▴ 40

Hello,

I've created a small reference fasta file that consists of combined sequences of two exons from two different genes where I know the fusion has occurred. I've uploaded that reference file into Galaxy as well as a fastq file and ran the tool with all default setting. The alignment worked perfectly.

I'm trying to reproduce the same on the command line. I've downloaded and installed HISAT2 version 2.0.5. I'm trying to reproduce the same alignment that was achieved using Galaxy.

Here are the steps that I followed:

1. Indexed the reference with HISAT2

hisat2-build /data/HISAT2/BAG4_ref.fasta BAG4_ref_indexed

2. Performed the alinement

hisat2 -x /data/HISAT2/index/BAG4_ref_indexed -U /data/HISAT2/IonXpress_011.fq -S /data/HISAT2/Bag4.sam samtools view -bS Bag4.sam > Bag4.bam

Here are the stats:

345063 reads; of these:
345063 (100.00%) were unpaired; of these:
344481 (99.83%) aligned 0 times
578 (0.17%) aligned exactly 1 time
4 (0.00%) aligned >1 times
0.17% overall alignment rate


This command-line method produces a much smaller bam file 5,091K vs 19,349 K (Galaxy).

Thanks

RNA-Seq • 2.4k views
1
Entering edit mode

Click on the information icon (an "(i)") on the history item and see if the exact command that was used is included. I try to make that available on the instances that I administer, perhaps others do as well.

0
Entering edit mode

@Devon. When I click on "i" Dataset information, Job information and Tool parameters are displayed. I don't see any commands. Here what is says for Tool version.

Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.0.5.2
Galaxy Tool Version:    2.0.5.2
Tool Version:   /galaxy/main/deps/_conda/envs/mulled-v1-2bb67013a57cac1e35f407d06d1f347baae35159f498496f1e36f84784069212/bin/hisat2-align-s version 2.0.5 64-bit Built on login-node03 Fri Nov 4 10:42:22 EDT 2016 Compiler: gcc version 4.8.2 (GCC) Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

1
Entering edit mode

Don't go on the size of the file alone. That is never is a good statistic for file comparison. Do you have a similar summary stat for Galaxy's alignment as one posted above?

0
Entering edit mode

I ran: samtools flagstat glaxy.bam and it looks like the stats are comparable:

345063 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
586 + 0 mapped (0.17%:-nan%)
0 + 0 paired in sequencing
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


I don't understand why there is such a difference in size for two bam files that are results of exactly the same alignment. Thanks

0
Entering edit mode

It is possible that the unmapped reads are being written to the output file on galaxy but in your alignment they are not.

The alignment numbers don't look that different

578 (0.17%) aligned exactly 1 time # your alignment
586 + 0 mapped (0.17%:-nan%) # galaxy alignment


That is poor alignment in both cases BTW.