Question: HISAT2: Question regarding providing file path to indexed genome folder
0
gravatar for venura
4 months ago by
venura60
University of Peradeniya
venura60 wrote:

Hi,

I have a quick question regarding directing the path to the indexed genome folder. Following is the code I used;

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

after loading the module

module load HISAT2/2.2.0-foss-2018b

I was running the script on our ADA cluster and got the following error

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR): "/scratch/datasets/genome_indexes/other_genomes/potato/hisat2" does not exist Exiting now ...

Can someone help me to resolve this issue? Thanks in advance.

hisat2 rna-seq • 271 views
ADD COMMENTlink modified 4 months ago by ATpoint46k • written 4 months ago by venura60

Output of ls /scratch/datasets/genome_indexes/other_genomes/potato/?

ADD REPLYlink written 4 months ago by ATpoint46k

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.fa

ADD REPLYlink modified 4 months ago • written 4 months ago by venura60

Based on the listing above it looks like there is no hisat2 directory. So you will need to try

-x /scratch/datasets/genome_indexes/other_genomes/potato/DM_1-3_516_R44_potato_genome_assembly.v6.1
ADD REPLYlink modified 4 months ago • written 4 months ago by GenoMax96k

My apologies. I was using

ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/

Here is the correct output for ls /scratch/datasets/genome_indexes/other_genomes/potato/

blast bowtie bowtie2 bwa hisat2 picard samtools

ADD REPLYlink written 4 months ago by venura60

Then simply insert hisat2 in right spot above.

ADD REPLYlink written 4 months ago by GenoMax96k

Sorry, I think I confused you;

The output for ls /scratch/datasets/genome_indexes/other_genomes/potato/ is (answer to ATPoint's question)

blast bowtie bowtie2 bwa hisat2 picard samtools

When I ran the job I used the following code

hisat2 -p $threads --dta --rna-strandness RF -x /scratch/datasets/genome_indexes/other_genomes/potato/hisat2 -1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz -S ${SAMPLE}.sam

Directing to the hisat2 folder and got the error mentioned in the original post.

ADD REPLYlink written 4 months ago by venura60
1

These are not genome indices, are they? The hisat index consists of several files, e.g. genome.ht2 etc...

This is how it should look e.g. for a genome called mm10.fa:

mm10.1.ht2  mm10.2.ht2  mm10.3.ht2  mm10.4.ht2  mm10.5.ht2  mm10.6.ht2  mm10.7.ht2  mm10.8.ht2

Here it would be -x mm10 as it is the suffix of the indexed file you have to provide. it then uses these ht2 files as needed.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint46k

Inside the hisat2 folder ( ls /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/), there are eight files (I guess that is the default number it makes)

DM_1-3_516_R44_potato_genome_assembly.v6.1.1.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.2.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.3.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.4.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.5.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.6.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.7.ht2 DM_1-3_516_R44_potato_genome_assembly.v6.1.8.ht2

Ah I see; that means I need to use DM_1-3_516_R44_potato_genome_assembly.v6.1.1 as follows /scratch/datasets/genome_indexes/other_genomes/potato/hisat2/DM_1-3_516_R44_potato_genome_assembly.v6.1

Thank you! I will do that

ADD REPLYlink modified 4 months ago • written 4 months ago by venura60

Even after changing the path, I am getting the following error (I killed the job after this error to save my service units)

sh: /sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR):

Prob due to a problem at cluster? (I emailed them too.... but no reply yet)

ADD REPLYlink written 4 months ago by venura60

Are the fastq files in the right spot? Are those variables correctly pointing to those files?

ADD REPLYlink written 4 months ago by GenoMax96k

they are in the same directory where the job is running from. I also check the file extensions too. Nothing makes sense :(

ADD REPLYlink written 4 months ago by venura60
-1 ${SAMPLE}.fq.gz -2 ${SAMPLE}.fq.gz

This by the way is the same file. Try simplifying your script.

ADD REPLYlink written 4 months ago by ATpoint46k

Oh, Shoot! You are correct. Still learning A, B, Cs..

ADD REPLYlink written 4 months ago by venura60
2
gravatar for ATpoint
4 months ago by
ATpoint46k
ATpoint46k wrote:

I personally always try to make it as simple as possible. Copy all the fastq files into one folder and give it clear names e.g.

Sample1_1.fastq.gz Sample1_2.fastq.gz Sample2_2.fastq.gz Sample2_2.fastq.gz

Then use the simplest possible script (or learn how to use workflow managers):

Idx=path/to/idxfiles

for i in *_1.fastq.gz
  do
  SAMPLE=${i%_1.fastq.gz}
  hisat2 (options...) -x "${Idx}" -1 ${SAMPLE}_1.fastq.gz -2 ${SAMPLE}_2.fastq.gz \
  | samtools view -o ${SAMPLE}.bam
  done

That's it. Eliminate unnecessary elements from your script as well as echo that indicate any kind of status. Trim it to the very necessary parts and then get it runnign. Then you can add additional things once it works.

ADD COMMENTlink modified 4 months ago • written 4 months ago by ATpoint46k

Will do the needful and Get back with the outcome! Thanks a lot, ATpoint! 🙏

ADD REPLYlink written 4 months ago by venura60

Everything is running fine and got bam files too. :) The only exception is the following (I guess it is something to with installation at ADA cluster since I don't see such script there)

/sw/eb/software/HISAT2/2.2.0-foss-2018b/bin/hisat2_read_statistics.py: No such file or directory (ERR)

PS: Appreciate if you can point me to a good workflow management tool and tutorial for similar analysis like this.

ADD REPLYlink modified 4 months ago • written 4 months ago by venura60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour
_