Question: Problem in indexing toplevel genome with HISAT2
0
gravatar for Batu
7 months ago by
Batu170
Batu170 wrote:

As I mentioned in my old post, I was unable to index a toplevel genome (both unmasked and soft-masked) with HISAT2. I still have problems with that. I'm using command as below: hisat2-build -f Mus_musculus.GRCm38.dna.toplevel.fa.gz Cm3895_ht2/GRCm38

Firstly, it gives these warnings in lots of lines:

Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps

and after some time, it gives an error as below:

Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Reference file does not seem to be a FASTA file
  Time to join reference sequences: 00:00:00
Total time for call to driver() for forward index: 00:28:31
Error: Encountered internal HISAT2 exception (#1)
Command: hisat2-build --wrapper basic-0 -f Mus_musculus.GRCm38.dna.toplevel.fa.gz Cm3895_ht2/GRCm38 
Deleting "Cm3895_ht2/GRCm38.1.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.2.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.3.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.4.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.5.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.6.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.7.ht2" file written during aborted indexing attempt.
Deleting "Cm3895_ht2/GRCm38.8.ht2" file written during aborted indexing attempt.

Previously, I had no problem when using separate chromosome files. Is there anything I'm missing when using toplevel genome? Thanks...

ADD COMMENTlink modified 7 months ago • written 7 months ago by Batu170
2

Guess you have the answer inside error log: 'Reference file does not seem to be a FASTA file'. Try to unpack the reference file to fasta format and run index build once again.

ADD REPLYlink modified 7 months ago • written 7 months ago by ahaswer150

Yes, it worked after unpacking. Gzipped files normally work with main hisat2 command, therefore I couldn't think about this reason. Thank you...

ADD REPLYlink written 7 months ago by Batu170
1

Worth reading: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

ADD REPLYlink written 7 months ago by WouterDeCoster41k
1
gravatar for Batu
7 months ago by
Batu170
Batu170 wrote:

It worked after unpacking the genome. I couldn't figure out that gzipped files won't work whereas they work with main hisat2 command. Problem solved!

ADD COMMENTlink written 7 months ago by Batu170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1315 users visited in the last hour