Question

Creating Hisat2 Index

0

Entering edit mode

6.2 years ago

Vasu ▴ 770

As mentioned in the paper I first extracted splice-sites and then exons. Next I used hisat2-build

hisat2-build --ss gencode.v27.primary_assembly.annotation.ss --exon gencode.v27.primary_assembly.annotation.exon annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa Hisat2index

After few minutes this is what I saw:

Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:34
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:01:10
  Time to read SNPs and splice sites: 00:00:01
Killed

Does Killed mean there is some error?

But I see files like Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Did everything went well or do I need to fix something?

RNA-Seq hisat2 genome index • 6.8k views

ADD COMMENT • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

Potentially, your indexes may be incomplete.

You can download pre-made indexes directly from here.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

But I would like to build my own using the gtf

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

How much memory do you have available? That may be the limiting factor in your case unless you are running this on a cluster and ran out of wall clock time.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

Ok. I'm building index on cluster not on my desktop computer.

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

Ask for more RAM and 3-4 h just to be safe when you re-run.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

Will give a try with this Thanks

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

As you said I gave the run with 30G memory and more run time. The following is what I see:

Output files: "Hisat2index.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:29
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:24
  Time to read SNPs and splice sites: 00:00:01

I don't see any Killed message. Does it mean everything is fine? It took only 20 minutes and I have the outputs.

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

There shouldn't be any rf files in there - these are temp file. If the index is compete you will have the follwing files e.g.,

GRCh38.primary_assembly_tran.1.ht2
GRCh38.primary_assembly_tran.2.ht2
GRCh38.primary_assembly_tran.3.ht2
GRCh38.primary_assembly_tran.4.ht2
GRCh38.primary_assembly_tran.5.ht2
GRCh38.primary_assembly_tran.6.ht2
GRCh38.primary_assembly_tran.7.ht2
GRCh38.primary_assembly_tran.8.ht2

and is that all the output you got? I had much more... and 20 min even with 30G seems a bit short....

ADD REPLY • link 6.2 years ago by JJ ▴ 680

0

Entering edit mode

These are the files I got.

Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Do you think this right? OR do I need to try with more memory and time. I gave 30 G and 6 h run time. But in 20 mins the job is completed.

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

Test by doing an alignment with a small number of reads. If things are not right that alignment job should fail.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

If the alignment job fails then What should I do? Do I need build the index again.

BTW I saw the sizes GRCh38_Hisat2_index.4.ht2 (703M), GRCh38_Hisat2_index.3.ht2 (12K), GRCh38_Hisat2_index.2.ht2 (0), GRCh38_Hisat2_index.1.ht2 (8K), GRCh38_Hisat2_index.8.ht2 (1.1K), GRCh38_Hisat2_index.7.ht2 (5K) and GRCh38_Hisat2_index.0.rf (39G)

ADD REPLY • link 6.2 years ago by Vasu ▴ 770

0

Entering edit mode

I don't think your index is complete - especially as you still have a temp file (Hisat2index.0.rf) and the file sizes are very small - in one case even 0!!! mine range from 1.8 G to 12 KB - and more importantly are very similar to the file sizes of the index files I downloaded from HISAT2. As there is no error message I cannot tell you what went wrong... But according to this protocol you need 160G for the whole human genome - so my guess is that this is the issue. So you need to set the RAM at least to 160G.

ADD REPLY • link 6.2 years ago by JJ ▴ 680