I am a new person here. I am trying to make an index file from gencode lncrna annotation. I did the following.
extract splice set information
hisat2_extract_splice_sites.py gencode.v28.long_noncoding_RNAs.gtf >gnc.ss
extract exon information
hisat2_extract_exons.py gencode.v28.long_noncoding_RNAs.gtf >gnc.exon
hisat2-build -p11 --ss gnc.ss --exon gnc.exon geno GRCh38.p12.genome.fa genlnc
The computer I am using is a windows workstation with 12 cores (I am using 11 cores, but it hardly uses 10% CPU at most). It shows installed RAM as 45 GB, of which it is using almost 40 GB for hisat2-build. I started the process on Monday and even though the computer is running continuously, it hasn't built the index. The hisat2 paper suggested that building an index for whole genome with 160 GB should take 2-3 hours. So I am confused why it hasn't finished even in 5 days if I have 1/4 of recommended RAM.
Before I tried using the primary assembly file to make the index and it didn't finish in two weeks. So I thought may the primary assembly file is too big and switched to p12. When I try to run ls -lh, I see that the biggest file is .rtf file which I read is a temporary file. Right now it is 42 GB. I am using cygwin to run linux commands on the windows. Am I missing something? Please advise.
Also on the side, could you tell me difference between using primary assembly and p12 or newer assembly for making index file?