Entering edit mode
5.5 years ago
ddowlin ▴ 70
I am trying to assemble a primate transcriptome using hisat2/stringtie. First I want to index the genome using hisat2. I used both --ss and --exon to provide information on splice sites and exons.
The manual states that 8 files should be produced after indexing 1.ht2 to 8.h2t. However, I only have six files with this suffix (5.ht2 and 6.ht2 are missing) and one file is completely empty. Additionally I have 20 .rf files (0.rf to 19.rf).
Does anyone know why this is? Was there an error with indexing or can I use these files as is?
You can download pre-built indexes from the Hisat2 website [https://ccb.jhu.edu/software/hisat2/index.shtml] u may also download genome_snp, genome_tran and / or genome_snp_tran.
Thanks for the suggestion. Unfortunately pre-made indices aren't available for the species I am interested in.
Can you try without the --ss and --exon options? These options need big RAM. If its a big genome, indexing may have stopped with error.
Thanks--I tried without the --ss and --exon options and it seems to have worked OK.
Yup. So my guess is that your indexing may have stopped due to RAM issues. You can still provide the 'splice sites' file at the time of alignment.
If there was no error produced during index build then go ahead and use them. Programs often will store these indexes in formats they choose/like. If there was an error then post that here.
I know this post is old but just thought I'd reply to this specific comment since I struggled with this issue for a while - I think HISAT2 is supposed to produce exactly 8 files, and as Satyajeet Khare mentioned may be the most correct explanation. I also got more than 8 files every time my process terminated prematurely due to RAM issues. I think the .rf files are temporary files and should NOT be used for alignment.
Do you know if your build run log had something in it that indicated a problem when you had the truncation happen? If HISAT2 devs have not done their due diligence to flag that in the log output then that is not a good thing.
Were you able to align data to (with what may be truncated index files)? One would expect HISAT2 to throw an error, if it detects an incomplete/corrupt index set.
hisat2-buildmanual says the following, which is incorrect. There should be 8 files in properly built indexes.
Hi all, I indexed the genome using hisat2 and used both --ss and --exon to provide information on splice sites and exons. And I also met the same problem. I also only got six files with this suffix (5.ht2 and 6.ht2 are missing). Have you solve the problem? And do you know the reason?
Have you tried to use the index for an alignment?
Don't go on the number of files produced.Properly built indexes should have 8 files.
Thanks for your help. I have not mapped the reads. OK. I will try.
I am having a similar issue - many .rf files but less than 8 .ht2 files. Was this ever resolved? Are they okay to use?
I doubt it is ok to use them. I have the same problem. Only 4 files are generated. I tried to use them for alignment but it won't work.