Question: HISAT2 Indexing using annotation for Rattus_norvegicus
1
gravatar for neranjan
19 months ago by
neranjan40
US
neranjan40 wrote:

Hi,

I am trying to create a HISAT2 index with annotation for Rattus_norvegicus (RAT) genome I downloaded from the Ensembl release 94.

I am currently using 220GB memory with 16 cores. My assumption is the memory which I am providing is adequate enough. But I can not create the HISAT2 index, and it gives the error of

Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to read SNPs and splice sites: 00:00:04
        is not reverse-deterministic, so reverse-determinize...
  Ran out of memory; automatically trying more memory-economical parameters.
        is not reverse-deterministic, so reverse-determinize...

and eventually fail with

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 08:45:56

HISAT2 website does have rat index but they do not have the annotation.

Iam using the command

hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

to create the index.

Any ideas is greatly appreciated.

Thanks

ADD COMMENTlink modified 3 months ago by robin.mesnage0 • written 19 months ago by neranjan40

Do you run on a cluster, and if so, what is the exact command, including the header lines for the scheduler? Did you request the entire memory of the node you are on?

ADD REPLYlink modified 19 months ago • written 19 months ago by ATpoint34k

No the node has 256GB memory and I only asked for 220GB of RAM. I never asks for the full amount since the node needs some memory to work with. In previous occasions I have only asked for 200GB.

In pervious cases for the same index I have asked for 300GB of RAM where the node had 512GB of memory , which didn't work as well.

ADD REPLYlink written 19 months ago by neranjan40

If you share the links to the necessary files, I can try to build it on a 3TB node if that helps you.

ADD REPLYlink written 19 months ago by ATpoint34k

Yes that might help me , Thank you very much for the help.

I am using the files hosted by Ensembl Data Base, and using the hisat2 version 2.1.0 to build the index. Following is the SLURM script which I use to build it. I will post it, where the memory, partition and qos might change depending on the cluster and the the scheduler which is been used.

#!/bin/bash
#SBATCH --job-name=hisat
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=220G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

#genome 
wget ftp://ftp.ensembl.org/pub/current_fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
#GTF 
wget ftp://ftp.ensembl.org/pub/current_gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.94.gtf.gz


for file in *.gz; do
       gunzip -d $file
done
echo "=========== Unzip Done ================="

BASE_NAME="Rattus_norvegicus"
FASTA_File="Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa"
GTF="Rattus_norvegicus.Rnor_6.0.94.gtf"
SPLICE="splice_site"
EXON="exon"

module load hisat2/2.1.0
#create splice sites
hisat2_extract_splice_sites.py ${GTF} > ${SPLICE}

#create exone file
hisat2_extract_exons.py ${GTF} > ${EXON}

#build index
hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

#build large index if the above does not work
#hisat2-build -p 16 --large-index --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

If the normal hisat2 build does not work, you can also try to build the large index using the commented part as well.

Thanks again for the help.

ADD REPLYlink written 19 months ago by neranjan40
1

I just started it and will come back once finished.

ADD REPLYlink written 19 months ago by ATpoint34k

thanks, appreciate it. if it complete successfully would like to know, how much memory did it used ?

ADD REPLYlink written 19 months ago by neranjan40
1

It finished without issues on a 1.5TB node. Used about 500GB at max. I am compressing and uploading it now to a cloud, and will share the download link once finished:

There it is: https://uni-muenster.sciebo.de/s/ztztgCWvQujnhjq

ADD REPLYlink modified 19 months ago • written 19 months ago by ATpoint34k

Thank you very much, I really appreciate the help you gave me, going out of the way. I was able to download the index from the link.

1.6G Rattus_norvegicus.1.ht2
654M Rattus_norvegicus.2.ht2
1.3M Rattus_norvegicus.3.ht2
651M Rattus_norvegicus.4.ht2
1.4G Rattus_norvegicus.5.ht2
663M Rattus_norvegicus.6.ht2
7.8M Rattus_norvegicus.7.ht2
1.6M Rattus_norvegicus.8.ht2

Again thank you very much.

ADD REPLYlink written 19 months ago by neranjan40
1

You‘re very welcome :)

ADD REPLYlink written 19 months ago by ATpoint34k

found a solution to generate the index using more memory

Cheers!

ADD REPLYlink modified 19 months ago • written 19 months ago by neranjan40

Hi,

I am having the same issue with HISAT2 Indexing using annotation for Rattus norvegicus. I currently don't have access to a cluster with sufficient memory and I am stuck with my transcriptome analyses. I have seen that @ATpoint made these indexes available but the link is dead.

Would @ATpoint or any of you be able to share these indexes again?

Thank you in advance,

ADD REPLYlink written 3 months ago by robin.mesnage0

I do not have them anymore. Why don't you use a tool such as salmon to quantify directly against the transcriptome. It barely requires any memory.

ADD REPLYlink written 3 months ago by ATpoint34k
1
gravatar for neranjan
19 months ago by
neranjan40
US
neranjan40 wrote:

I think the answer is to provide more memory for the run. Thank you ATpoint.

ADD COMMENTlink written 19 months ago by neranjan40
1

There is no need to close this question. Just accepting this as an answer is sufficient.

ADD REPLYlink written 19 months ago by ATpoint34k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour