Question: How to achieve the same alignment with HiSAT2 command line vs. HiSAT2 Galaxy
0
gravatar for yelekley7
7 months ago by
yelekley70
yelekley70 wrote:

Hello,

I've created a small reference fasta file that consists of combined sequences of two exons from two different genes where I know the fusion has occurred. I've uploaded that reference file into Galaxy as well as a fastq file and ran the tool with all default setting. The alignment worked perfectly.

I'm trying to reproduce the same on the command line. I've downloaded and installed HISAT2 version 2.0.5. I'm trying to reproduce the same alignment that was achieved using Galaxy.

Here are the steps that I followed:

  1. Indexed the reference with HISAT2

    hisat2-build /data/HISAT2/BAG4_ref.fasta BAG4_ref_indexed

  2. Performed the alinement

    hisat2 -x /data/HISAT2/index/BAG4_ref_indexed -U /data/HISAT2/IonXpress_011.fq -S /data/HISAT2/Bag4.sam samtools view -bS Bag4.sam > Bag4.bam

Here are the stats:

345063 reads; of these:
  345063 (100.00%) were unpaired; of these:
    344481 (99.83%) aligned 0 times
    578 (0.17%) aligned exactly 1 time
    4 (0.00%) aligned >1 times
0.17% overall alignment rate

This command-line method produces a much smaller bam file 5,091K vs 19,349 K (Galaxy).

What am I doing wrong. Please help to diagnose the problem.

Thanks

rna-seq • 565 views
ADD COMMENTlink modified 7 months ago by Devon Ryan80k • written 7 months ago by yelekley70
1

Click on the information icon (an "(i)") on the history item and see if the exact command that was used is included. I try to make that available on the instances that I administer, perhaps others do as well.

ADD REPLYlink written 7 months ago by Devon Ryan80k

@Devon. When I click on "i" Dataset information, Job information and Tool parameters are displayed. I don't see any commands. Here what is says for Tool version.

Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.0.5.2
Galaxy Tool Version:    2.0.5.2
Tool Version:   /galaxy/main/deps/_conda/envs/mulled-v1-2bb67013a57cac1e35f407d06d1f347baae35159f498496f1e36f84784069212/bin/hisat2-align-s version 2.0.5 64-bit Built on login-node03 Fri Nov 4 10:42:22 EDT 2016 Compiler: gcc version 4.8.2 (GCC) Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
ADD REPLYlink modified 7 months ago by genomax49k • written 7 months ago by yelekley70
1

Don't go on the size of the file alone. That is never is a good statistic for file comparison. Do you have a similar summary stat for Galaxy's alignment as one posted above?

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax49k

I ran: samtools flagstat glaxy.bam and it looks like the stats are comparable:

345063 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
586 + 0 mapped (0.17%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I don't understand why there is such a difference in size for two bam files that are results of exactly the same alignment. Thanks

ADD REPLYlink modified 7 months ago by genomax49k • written 7 months ago by yelekley70

It is possible that the unmapped reads are being written to the output file on galaxy but in your alignment they are not.

The alignment numbers don't look that different

578 (0.17%) aligned exactly 1 time # your alignment
586 + 0 mapped (0.17%:-nan%) # galaxy alignment

That is poor alignment in both cases BTW.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour