Question: HISAT2 Indexing using annotation for Rattus_norvegicus
0
gravatar for neranjan
4 months ago by
neranjan0
US
neranjan0 wrote:

Hi,

I am trying to create a HISAT2 index with annotation for Rattus_norvegicus (RAT) genome I downloaded from the Ensembl release 94.

I am currently using 220GB memory with 16 cores. My assumption is the memory which I am providing is adequate enough. But I can not create the HISAT2 index, and it gives the error of

Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
  Time to read SNPs and splice sites: 00:00:04
        is not reverse-deterministic, so reverse-determinize...
  Ran out of memory; automatically trying more memory-economical parameters.
        is not reverse-deterministic, so reverse-determinize...

and eventually fail with

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 08:45:56

HISAT2 website does have rat index but they do not have the annotation.

Iam using the command

hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

to create the index.

Any ideas is greatly appreciated.

Thanks

ADD COMMENTlink modified 3 months ago • written 4 months ago by neranjan0

Do you run on a cluster, and if so, what is the exact command, including the header lines for the scheduler? Did you request the entire memory of the node you are on?

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint13k

No the node has 256GB memory and I only asked for 220GB of RAM. I never asks for the full amount since the node needs some memory to work with. In previous occasions I have only asked for 200GB.

In pervious cases for the same index I have asked for 300GB of RAM where the node had 512GB of memory , which didn't work as well.

ADD REPLYlink written 4 months ago by neranjan0

If you share the links to the necessary files, I can try to build it on a 3TB node if that helps you.

ADD REPLYlink written 4 months ago by ATpoint13k

Yes that might help me , Thank you very much for the help.

I am using the files hosted by Ensembl Data Base, and using the hisat2 version 2.1.0 to build the index. Following is the SLURM script which I use to build it. I will post it, where the memory, partition and qos might change depending on the cluster and the the scheduler which is been used.

#!/bin/bash
#SBATCH --job-name=hisat
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 16
#SBATCH --mem=220G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

#genome 
wget ftp://ftp.ensembl.org/pub/current_fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
#GTF 
wget ftp://ftp.ensembl.org/pub/current_gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.94.gtf.gz


for file in *.gz; do
       gunzip -d $file
done
echo "=========== Unzip Done ================="

BASE_NAME="Rattus_norvegicus"
FASTA_File="Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa"
GTF="Rattus_norvegicus.Rnor_6.0.94.gtf"
SPLICE="splice_site"
EXON="exon"

module load hisat2/2.1.0
#create splice sites
hisat2_extract_splice_sites.py ${GTF} > ${SPLICE}

#create exone file
hisat2_extract_exons.py ${GTF} > ${EXON}

#build index
hisat2-build -p 16 --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

#build large index if the above does not work
#hisat2-build -p 16 --large-index --exon ${EXON} --ss ${SPLICE} ${FASTA_File} ${BASE_NAME}

If the normal hisat2 build does not work, you can also try to build the large index using the commented part as well.

Thanks again for the help.

ADD REPLYlink written 4 months ago by neranjan0
1

I just started it and will come back once finished.

ADD REPLYlink written 3 months ago by ATpoint13k

thanks, appreciate it. if it complete successfully would like to know, how much memory did it used ?

ADD REPLYlink written 3 months ago by neranjan0
1

It finished without issues on a 1.5TB node. Used about 500GB at max. I am compressing and uploading it now to a cloud, and will share the download link once finished:

There it is: https://uni-muenster.sciebo.de/s/ztztgCWvQujnhjq

ADD REPLYlink modified 3 months ago • written 3 months ago by ATpoint13k

Thank you very much, I really appreciate the help you gave me, going out of the way. I was able to download the index from the link.

1.6G Rattus_norvegicus.1.ht2
654M Rattus_norvegicus.2.ht2
1.3M Rattus_norvegicus.3.ht2
651M Rattus_norvegicus.4.ht2
1.4G Rattus_norvegicus.5.ht2
663M Rattus_norvegicus.6.ht2
7.8M Rattus_norvegicus.7.ht2
1.6M Rattus_norvegicus.8.ht2

Again thank you very much.

ADD REPLYlink written 3 months ago by neranjan0
1

You‘re very welcome :)

ADD REPLYlink written 3 months ago by ATpoint13k

found a solution to generate the index using more memory

Cheers!

ADD REPLYlink modified 3 months ago • written 3 months ago by neranjan0
1
gravatar for neranjan
3 months ago by
neranjan0
US
neranjan0 wrote:

I think the answer is to provide more memory for the run. Thank you ATpoint.

ADD COMMENTlink written 3 months ago by neranjan0
1

There is no need to close this question. Just accepting this as an answer is sufficient.

ADD REPLYlink written 3 months ago by ATpoint13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour