Question: HISAT2: Question regarding providing file path to indexed genome folder
0
gravatar for venura
5 weeks ago by
venura60
University of Peradeniya
venura60 wrote:

Hi,

I have a quick question regarding directing the path to the indexed genome folder. Following is the code I used;

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

after loading the module

module load HISAT2/2.2.0-foss-2018b

I was running the script on our ADA cluster and got the following error

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR): "/scratch/datasets/genome_indexes/other_genomes/potato/hisat2" does not exist Exiting now ...

Can someone help me to resolve this issue? Thanks in advance.

hisat2 rna-seq • 181 views
ADD COMMENTlink modified 5 weeks ago by ATpoint42k • written 5 weeks ago by venura60

Output of ls /scratch/datasets/genome_indexes/other_genomes/potato/?

ADD REPLYlink written 5 weeks ago by ATpoint42k

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.fa

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by venura60

Based on the listing above it looks like there is no hisat2 directory. So you will need to try

-x /scratch/datasets/genome_indexes/other_genomes/potato/DM_1-3_516_R44_potato_genome_assembly.v6.1
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax92k

My apologies. I was using

ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/

Here is the correct output for ls /scratch/datasets/genome_indexes/other_genomes/potato/

blast bowtie bowtie2 bwa hisat2 picard samtools

ADD REPLYlink written 5 weeks ago by venura60

Then simply insert hisat2 in right spot above.

ADD REPLYlink written 5 weeks ago by genomax92k

Sorry, I think I confused you;

The output for ls /scratch/datasets/genome_indexes/other_genomes/potato/ is (answer to ATPoint's question)

blast bowtie bowtie2 bwa hisat2 picard samtools

When I ran the job I used the following code

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

Directing to the hisat2 folder and got the error mentioned in the original post.

ADD REPLYlink written 5 weeks ago by venura60
1

These are not genome indices, are they? The hisat index consists of several files, e.g. genome.ht2 etc...

This is how it should look e.g. for a genome called mm10.fa:

mm10.1.ht2  mm10.2.ht2  mm10.3.ht2  mm10.4.ht2  mm10.5.ht2  mm10.6.ht2  mm10.7.ht2  mm10.8.ht2

Here it would be -x mm10 as it is the suffix of the indexed file you have to provide. it then uses these ht2 files as needed.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint42k

Inside the hisat2 folder ( ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/), there are eight files (I guess that is the default number it makes)

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2

Ah I see; that means I need to use DM_1-3_516_R44_potato_genome_assembly.v6.1.1 as follows /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/DM_1-3_516_R44_potato_genome_assembly.v6.1

Thank you! I will do that

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by venura60

Even after changing the path, I am getting the following error (I killed the job after this error to save my service units)

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR):

Prob due to a problem at cluster? (I emailed them too.... but no reply yet)

ADD REPLYlink written 5 weeks ago by venura60

Are the fastq files in the right spot? Are those variables correctly pointing to those files?

ADD REPLYlink written 5 weeks ago by genomax92k

they are in the same directory where the job is running from. I also check the file extensions too. Nothing makes sense :(

ADD REPLYlink written 5 weeks ago by venura60
-1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz

This by the way is the same file. Try simplifying your script.

ADD REPLYlink written 5 weeks ago by ATpoint42k

Oh, Shoot! You are correct. Still learning A, B, Cs..

ADD REPLYlink written 5 weeks ago by venura60
2
gravatar for ATpoint
5 weeks ago by
ATpoint42k
Germany
ATpoint42k wrote:

I personally always try to make it as simple as possible. Copy all the fastq files into one folder and give it clear names e.g.

Sample1_1.fastq.gz Sample1_2.fastq.gz Sample2_2.fastq.gz Sample2_2.fastq.gz

Then use the simplest possible script (or learn how to use workflow managers):

Idx=path/to/idxfiles

for i in *_1.fastq.gz
  do
  SAMPLE=${i%_1.fastq.gz}
  hisat2 (options...) -x "${Idx}" -1 ${SAMPLE}_1.fastq.gz -2 ${SAMPLE}_2.fastq.gz \
  | samtools view -o ${SAMPLE}.bam
  done

That's it. Eliminate unnecessary elements from your script as well as echo that indicate any kind of status. Trim it to the very necessary parts and then get it runnign. Then you can add additional things once it works.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by ATpoint42k

Will do the needful and Get back with the outcome! Thanks a lot, ATpoint! 🙏

ADD REPLYlink written 5 weeks ago by venura60

Everything is running fine and got bam files too. :) The only exception is the following (I guess it is something to with installation at ADA cluster since I don't see such script there)

/sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR)

PS: Appreciate if you can point me to a good workflow management tool and tutorial for similar analysis like this.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by venura60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2079 users visited in the last hour