Question: Creating Hisat2 Index
0
gravatar for Vasu
21 months ago by
Vasu410
Vasu410 wrote:

As mentioned in the paper I first extracted splice-sites and then exons. Next I used hisat2-build

hisat2-build --ss gencode.v27.primary_assembly.annotation.ss --exon gencode.v27.primary_assembly.annotation.exon annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa Hisat2index

After few minutes this is what I saw:

Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:34
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:01:10
  Time to read SNPs and splice sites: 00:00:01
Killed

Does Killed mean there is some error?

But I see files like Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Did everything went well or do I need to fix something?

index rna-seq hisat2 genome • 2.4k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by Vasu410

Potentially, your indexes may be incomplete.

You can download pre-made indexes directly from here.

ADD REPLYlink written 21 months ago by genomax74k

But I would like to build my own using the gtf

ADD REPLYlink written 21 months ago by Vasu410

How much memory do you have available? That may be the limiting factor in your case unless you are running this on a cluster and ran out of wall clock time.

ADD REPLYlink modified 21 months ago • written 21 months ago by genomax74k

Ok. I'm building index on cluster not on my desktop computer.

ADD REPLYlink written 21 months ago by Vasu410

Ask for more RAM and 3-4 h just to be safe when you re-run.

ADD REPLYlink modified 21 months ago • written 21 months ago by genomax74k

Will give a try with this Thanks

ADD REPLYlink written 21 months ago by Vasu410

As you said I gave the run with 30G memory and more run time. The following is what I see:

Output files: "Hisat2index.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:29
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:24
  Time to read SNPs and splice sites: 00:00:01

I don't see any Killed message. Does it mean everything is fine? It took only 20 minutes and I have the outputs.

ADD REPLYlink written 21 months ago by Vasu410

There shouldn't be any rf files in there - these are temp file. If the index is compete you will have the follwing files e.g.,

GRCh38.primary_assembly_tran.1.ht2
GRCh38.primary_assembly_tran.2.ht2
GRCh38.primary_assembly_tran.3.ht2
GRCh38.primary_assembly_tran.4.ht2
GRCh38.primary_assembly_tran.5.ht2
GRCh38.primary_assembly_tran.6.ht2
GRCh38.primary_assembly_tran.7.ht2
GRCh38.primary_assembly_tran.8.ht2

and is that all the output you got? I had much more... and 20 min even with 30G seems a bit short....

ADD REPLYlink written 21 months ago by JJ470

These are the files I got.

Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Do you think this right? OR do I need to try with more memory and time. I gave 30 G and 6 h run time. But in 20 mins the job is completed.

ADD REPLYlink modified 21 months ago • written 21 months ago by Vasu410

Test by doing an alignment with a small number of reads. If things are not right that alignment job should fail.

ADD REPLYlink written 21 months ago by genomax74k

If the alignment job fails then What should I do? Do I need build the index again.

BTW I saw the sizes GRCh38_Hisat2_index.4.ht2 (703M), GRCh38_Hisat2_index.3.ht2 (12K), GRCh38_Hisat2_index.2.ht2 (0), GRCh38_Hisat2_index.1.ht2 (8K), GRCh38_Hisat2_index.8.ht2 (1.1K), GRCh38_Hisat2_index.7.ht2 (5K) and GRCh38_Hisat2_index.0.rf (39G)

ADD REPLYlink modified 21 months ago • written 21 months ago by Vasu410

I don't think your index is complete - especially as you still have a temp file (Hisat2index.0.rf) and the file sizes are very small - in one case even 0!!! mine range from 1.8 G to 12 KB - and more importantly are very similar to the file sizes of the index files I downloaded from HISAT2. As there is no error message I cannot tell you what went wrong... But according to this protocol you need 160G for the whole human genome - so my guess is that this is the issue. So you need to set the RAM at least to 160G.

ADD REPLYlink modified 21 months ago • written 21 months ago by JJ470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1128 users visited in the last hour