Question

HISAT2 reference genomes and starting code

2

Entering edit mode

5.1 years ago

Morris_Chair ▴ 350

Dear All,

I’m making a command line for HISAT following the manual instruction. What is not clear to me is how to use the genome index files. I downloaded from HISAT website the reference genome (genome.1.ht2… until genome.8.ht2)

My question is how to use this? the manual says “any of the index files up to but not including the final .1.ht2” can I just pick one, insert into the line command and remove thee others from the folder?

My command would be like

hisat2 -q -x genome -1 mymate1.fq -2 mymate2.fq -S -o resultpath

I know it's not correct.. but can you kindly help me to fix it?

Thank you

RNA-Seq • 4.1k views

ADD COMMENT • link 5.1 years ago by Morris_Chair ▴ 350

lieven.sterck · Accepted Answer · 2019-03-16

2

Entering edit mode

5.1 years ago

lieven.sterck 15k

what the manual is trying to explain to you with

“any of the index files up to but not including the final .1.ht2”

is that you need to provide the file name you used for the index but omit the '.1.ht2' part from it. So using simply 'genome' as you did in your example is correct (given the index names you provided).

and, no, you can not remove the others from your directory, the software needs all of them to work.

EDIT: to resolve all issues with the posted cmdline, you will need to add a correct value (== a filename to write the sam output to) for the -S parameter.

ADD COMMENT • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

Hi lieven.sterk, there is something wrong in what I'm doing, you say "you need to provide the file name you used for the index" but I didn't do any index, I just downloaded the file from HISAT and obtained this folder with files genome.1.ht2 and so on.. then I use them in this line command because I understood that they are already indexed, is it right?

Thank you

hisat2 -q -x genome -1 Treated_1_m1.fastq -2 Treated_1_m2.fastq -S


Error: 0 mate files/sequences were specified with -1, but 1
mate files/sequences were specified with -2.  The same number of mate files/
sequences must be specified with -1 and -2.
Error: Encountered internal HISAT2 exception (#1)
Command: /home/p.panelli/miniconda2/bin/hisat2-align-s --wrapper basic-0 -q -x /home/p.panelli/RNAseq/annotation/transcriptome/hisat2refgenes/grch37/genome -S -1 -2 /home/p.panelli/RNAseq/my_fastq/Treated_1_m2.fastq /home/p.panelli/RNAseq/my_fastq/Treated_1_m1.fastq
(ERR): hisat2-align exited with value 1

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

OK, what you have downloaded is the indexed genome, so you don't need to index it yourself anymore.

Looking at the error you posted it does however not look like you have a problem with the index but rather with the input files of your reads. Are both files ( Treated_1_m1.fastq & Treated_1_m2.fastq ) present in the folder/location where you try to execute hisat2 ?

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

yes they are in the same folder, this is the entire line, I usually use the autocompletetion to make sure that everything is ok

[@ws7910 RNAseq]$ hisat2 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -1 ./my_fastq/Treated_1_m1.fastq -2 ./my_fastq/Treated_1_m2.fastq -S

I hope is not a problem due to Miniconda..

Thank you

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

0

Entering edit mode

Have you tried to execute this command line? As long as the relative paths are correct it should either run fine or will generate some useful error messages that we can debug.

ADD REPLY • link 5.1 years ago by GenoMax 141k

0

Entering edit mode

Hello Genomax, yes, when I run this command line I have two errors, it doesn't like my mate files

Error: 0 mate files/sequences were specified with -1, but 1
mate files/sequences were specified with -2.  The same number of mate files/
sequences must be specified with -1 and -2.
Error: Encountered internal HISAT2 exception (#1)
Command: /home/p.panelli/miniconda2/bin/hisat2-align-s --wrapper basic-0 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -S -1 -2 ./my_fastq/Treated_1_m2.fastq ./my_fastq/Treated_1_m1.fastq
(ERR): hisat2-align exited with value 1

Thank you

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

assuming that the path part is resolved then something must be off with your input files.

Can you post a head of both those files? Did you do any manipulations to those input files before using them as input here?

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

Yes I did a manipulation, this two fastq files derive from a Bam files, maybe is for that? with salmon they were ok tho.. Here the heads

[@ws7910 my_fastq]$ cat Treated_1_m1.fastq | head
@HWI-ST571:93:C00B8ACXX:1:1301:8908:31103/1
CTGGACTCCACACTCTCCTGGGTTTCACCTTTGTAGCAGGATCCCTGCAGACCAGGCCCATGACAAACACCGTCTCCAGCGGGCAGAGCAAAGGAAGGGCA
+
@<@DDFFFGHGBBGGDHGAEGGGIFHIID<EHFBFGI>??B@D99?*B;FHABBCHBGC@77=3=E;CHHFB;AA;>(-;>=?B#################

[@ws7910 my_fastq]$ cat Treated_1_m2.fastq | head
@HWI-ST571:93:C00B8ACXX:1:1301:8908:31103/2
CAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGGGTTTGTCATGGGGCTGGTC
+
B=@DDDDDBF<ABDFBFE>D<FFGC7CA;?E93ECGGG9:B<BD@D(.'7B=7@:C=D@E<E:)=?A:@D/9',8((;??B####################

Thank you

ADD REPLY • link updated 5.1 years ago by lieven.sterck 15k • written 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

hmm, looks fine at first sight.

however, this can't be the complete output from this cat | head cmdline, no? (unless you only have 1 read in it). If the former can we ask to always post the complete output of commands and also always the exact output from the command you describe. thx

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

sure, I was trying to to make things more simple, here is the command + output thank you to all of you

[@ws7910 my_fastq]$ cat Treated_1_m1.fastq | head
@HWI-ST571:93:C00B8ACXX:1:1301:8908:31103/1
CTGGACTCCACACTCTCCTGGGTTTCACCTTTGTAGCAGGATCCCTGCAGACCAGGCCCATGACAAACACCGTCTCCAGCGGGCAGAGCAAAGGAAGGGCA
+
@<@DDFFFGHGBBGGDHGAEGGGIFHIID<EHFBFGI>??B@D99?*B;FHABBCHBGC@77=3=E;CHHFB;AA;>(-;>=?B#################
@HWI-ST571:93:C00B8ACXX:1:2301:13621:16305/1
CCTGAGTCCACCGGGTGCTTTCTGCCCACCCCCTGCTCTTGCCAACTGGCCCCTGCTTCCCCTAGGGCACATGCTGGAAGCCCTGGGCCGCCACCAGAGGT
+
CCCFFFDFHGHHHJJCFHHIJJJJIJJIJJJJJBHJGIIJIGGCHIJHIIIJIAACADFFFFC=ACDD@DDCCCCCDADDDDDCADBDBBBBBDDDABDB4
@HWI-ST571:93:C00B8ACXX:1:1101:17698:3530/1
TGGGGGCATCGGCAAGGCCAAGCTGCGCAGCATGAAGGAGCGAAAGCTGGAGAAGCAGCAGCAGAAGGAGCAGGAGCAAGGTGAGCGGGCCCTGGAGCTTG

[@ws7910 my_fastq]$ cat Treated_1_m2.fastq | head
@HWI-ST571:93:C00B8ACXX:1:1301:8908:31103/2
CAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGGGTTTGTCATGGGGCTGGTC
+
B=@DDDDDBF<ABDFBFE>D<FFGC7CA;?E93ECGGG9:B<BD@D(.'7B=7@:C=D@E<E:)=?A:@D/9',8((;??B####################
@HWI-ST571:93:C00B8ACXX:1:2301:13621:16305/2
CTGAGGACCTCTGGTGGCGGCCCAGGGCTTCCAGCATGTGCCCTAGGGGAAGCAGGGGCCAGTTGGCAAGAGCAGGGGGTGGGCAGAAAGCACCCGGTGGA
+
CCCFFFFFHHGHHIEHHIJJIJJIJJJJJIJJIGJJIJJJJJJJJJJHGFDEFDCDDDDDBBCDDDDDDBCBDDDDDDD9>BB<ADBCDCCDCBD@@<>B8
@HWI-ST571:93:C00B8ACXX:1:1101:17698:3530/2
CCGACTGCAAGCTCCAGGGCCCGCTCACCTTGCTCCTGCTCCTTCTGCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTTGCCGAT

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

thx! and it still looks fine ;)

for future reference: if you simply mention you shortened the output it's fine as well (or do |head -4 off course)

so both files seems to be OK. Can you, just for testing, switch them around in your cmdline ( -1 Treated_1_m2.fastq and -2 Treated_1_m1.fastq ) ?

Can also you run the cmdline but then with a single input file only ?

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

Here is the cmdline with two files swapped and the result didn't change :/

[@ws7910 RNAseq]$ hisat2 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -1 ./my_fastq/Treated_1_m2.fastq -2 ./my_fastq/Treated_1_m1.fastq -S

Error: 0 mate files/sequences were specified with -1, but 1
mate files/sequences were specified with -2.  The same number of mate files/
sequences must be specified with -1 and -2.

Error: Encountered internal HISAT2 exception (#1)
Command: /home/p.panelli/miniconda2/bin/hisat2-align-s --wrapper basic-0 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -S -1 -2 ./my_fastq/Treated_1_m1.fastq ./my_fastq/Treated_1_m2.fastq
(ERR): hisat2-align exited with value 1

If I run a single file with this command line

[@ws7910 RNAseq]$ hisat2 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -1 ./my_fastq/Treated_1_m1.fastq -S

I have a list of hisat2 options and functions, probably is wrong this command ?

Thanks

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

I think that the -S option needs a value, as in a filename to write the sam output to. If you omit it , the order of the options is altered apparently and now the value for -S because -1 , which could explain the error/behaviour.

As an alternative check for this: there likely will be a file created in your directory called -1

ADD REPLY • link 5.1 years ago by lieven.sterck 15k

0

Entering edit mode

Lieven!!! probably I solved the issue !!!! geeze I can't believe! have a look please

[@ws7910 RNAseq]$ hisat2 -q -x ./annotation/transcriptome/hisat2refgenes/grch37/genome -1 ./my_fastq/Treated_1_m1.fastq -2 ./my_fastq/Treated_1_m2.fastq -S sample.sam
680357 reads; of these:
  680357 (100.00%) were paired; of these:
    10129 (1.49%) aligned concordantly 0 times
    649751 (95.50%) aligned concordantly exactly 1 time
    20477 (3.01%) aligned concordantly >1 times
    ----
    10129 pairs aligned concordantly 0 times; of these:
      6632 (65.48%) aligned discordantly 1 time
    ----
    3497 pairs aligned 0 times concordantly or discordantly; of these:
      6994 mates make up the pairs; of these:
        3493 (49.94%) aligned 0 times
        2951 (42.19%) aligned exactly 1 time
        550 (7.86%) aligned >1 times
99.74% overall alignment rate
680357 reads; of these:
  680357 (100.00%) were paired; of these:
    10129 (1.49%) aligned concordantly 0 times
    649751 (95.50%) aligned concordantly exactly 1 time
    20477 (3.01%) aligned concordantly >1 times
    ----
    10129 pairs aligned concordantly 0 times; of these:
      6632 (65.48%) aligned discordantly 1 time
    ----
    3497 pairs aligned 0 times concordantly or discordantly; of these:
      6994 mates make up the pairs; of these:
        3493 (49.94%) aligned 0 times
        2951 (42.19%) aligned exactly 1 time
        550 (7.86%) aligned >1 times
99.74% overall alignment rate

Now I have a summary.bam which I can rename and use for further analysis (I guess)

Thanks

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

Yes, that looks how it is supposed to look like!

As I pointed out the -S needed a correct value (sample.sam in this case thus)

ADD REPLY • link 5.1 years ago by lieven.sterck 15k