Question: Hisat2 index building taking days even with 40 GB RAM
0
gravatar for piyushjo
2.5 years ago by
piyushjo550
piyushjo550 wrote:

Hi,

I am a new person here. I am trying to make an index file from gencode lncrna annotation. I did the following.

extract splice set information

hisat2_extract_splice_sites.py gencode.v28.long_noncoding_RNAs.gtf >gnc.ss

extract exon information

hisat2_extract_exons.py gencode.v28.long_noncoding_RNAs.gtf >gnc.exon

run build

hisat2-build -p11 --ss gnc.ss --exon gnc.exon geno GRCh38.p12.genome.fa genlnc

The computer I am using is a windows workstation with 12 cores (I am using 11 cores, but it hardly uses 10% CPU at most). It shows installed RAM as 45 GB, of which it is using almost 40 GB for hisat2-build. I started the process on Monday and even though the computer is running continuously, it hasn't built the index. The hisat2 paper suggested that building an index for whole genome with 160 GB should take 2-3 hours. So I am confused why it hasn't finished even in 5 days if I have 1/4 of recommended RAM.

Before I tried using the primary assembly file to make the index and it didn't finish in two weeks. So I thought may the primary assembly file is too big and switched to p12. When I try to run ls -lh, I see that the biggest file is .rtf file which I read is a temporary file. Right now it is 42 GB. I am using cygwin to run linux commands on the windows. Am I missing something? Please advise.

Also on the side, could you tell me difference between using primary assembly and p12 or newer assembly for making index file?

hisat2 build sequencing • 1.8k views
ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 2.5 years ago by piyushjo550

I don't have the computational explanation you're looking for, unfortunately, but I don't think the time to completion scales down linearly in the way you're expecting. I tried building an index with 32GB of RAM and it failed - I think the index build needed to load more data than that into the memory. I eventually used a cluster and assigned ~200GB to the operation, and it ran smoothly. If you have access to cloud or cluster resources I recommend you go that route.

ADD REPLYlink written 2.5 years ago by Russ470

Hi Russ. Thanks for the reply. I do have access to a Linux cloud/cluster. But I couldn't install hisat2 over there. Any tips for that?

ADD REPLYlink written 2.5 years ago by piyushjo550
1

You'll have to talk to the sys admin of your cluster if you don't have the privileges to install hisat2.

ADD REPLYlink written 2.5 years ago by Russ470

Have you tried installation using (bio)conda?

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k

Hi Wouter. I was able to download the hisat2 and add it to the path on linux server. Now I am running into the problem of libstdc++.so.6 bot being updated. I asked the server manager and he said the system is old and updating is a pain. Could you tell me if there are some linux servers I can access and perform this and if they are free.

ADD REPLYlink written 2.5 years ago by piyushjo550

So you have tried installation using bioconda?

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k

No I just downloaded and unpacked the binary for Linux from hisat2 and added the directory to path.

ADD REPLYlink written 2.5 years ago by piyushjo550
1

Why don't you try installation using bioconda?

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k

Couldn't install miniconda because of the same libstdc problem. :(

ADD REPLYlink written 2.5 years ago by piyushjo550

Right, well, that sucks.

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1879 users visited in the last hour