Creating Hisat2 Index
0
0
Entering edit mode
6.2 years ago
Vasu ▴ 770

As mentioned in the paper I first extracted splice-sites and then exons. Next I used hisat2-build

hisat2-build --ss gencode.v27.primary_assembly.annotation.ss --exon gencode.v27.primary_assembly.annotation.exon annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa Hisat2index

After few minutes this is what I saw:

Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:34
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:01:10
  Time to read SNPs and splice sites: 00:00:01
Killed

Does Killed mean there is some error?

But I see files like Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Did everything went well or do I need to fix something?

RNA-Seq hisat2 genome index • 6.8k views
ADD COMMENT
0
Entering edit mode

Potentially, your indexes may be incomplete.

You can download pre-made indexes directly from here.

ADD REPLY
0
Entering edit mode

But I would like to build my own using the gtf

ADD REPLY
0
Entering edit mode

How much memory do you have available? That may be the limiting factor in your case unless you are running this on a cluster and ran out of wall clock time.

ADD REPLY
0
Entering edit mode

Ok. I'm building index on cluster not on my desktop computer.

ADD REPLY
0
Entering edit mode

Ask for more RAM and 3-4 h just to be safe when you re-run.

ADD REPLY
0
Entering edit mode

Will give a try with this Thanks

ADD REPLY
0
Entering edit mode

As you said I gave the run with 30G memory and more run time. The following is what I see:

Output files: "Hisat2index.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  annot_AND_refFASTA/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Reading reference sizes
  Time reading reference sizes: 00:00:29
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:24
  Time to read SNPs and splice sites: 00:00:01

I don't see any Killed message. Does it mean everything is fine? It took only 20 minutes and I have the outputs.

ADD REPLY
0
Entering edit mode

There shouldn't be any rf files in there - these are temp file. If the index is compete you will have the follwing files e.g.,

GRCh38.primary_assembly_tran.1.ht2
GRCh38.primary_assembly_tran.2.ht2
GRCh38.primary_assembly_tran.3.ht2
GRCh38.primary_assembly_tran.4.ht2
GRCh38.primary_assembly_tran.5.ht2
GRCh38.primary_assembly_tran.6.ht2
GRCh38.primary_assembly_tran.7.ht2
GRCh38.primary_assembly_tran.8.ht2

and is that all the output you got? I had much more... and 20 min even with 30G seems a bit short....

ADD REPLY
0
Entering edit mode

These are the files I got.

Hisat2index.0.rf, Hisat2index.1.ht2, Hisat2index.2.ht2, Hisat2index.3.ht2, Hisat2index.4.ht2, Hisat2index.5.ht2, Hisat2index.6.ht2, Hisat2index.7.ht2, Hisat2index.8.ht2

Do you think this right? OR do I need to try with more memory and time. I gave 30 G and 6 h run time. But in 20 mins the job is completed.

ADD REPLY
0
Entering edit mode

Test by doing an alignment with a small number of reads. If things are not right that alignment job should fail.

ADD REPLY
0
Entering edit mode

If the alignment job fails then What should I do? Do I need build the index again.

BTW I saw the sizes GRCh38_Hisat2_index.4.ht2 (703M), GRCh38_Hisat2_index.3.ht2 (12K), GRCh38_Hisat2_index.2.ht2 (0), GRCh38_Hisat2_index.1.ht2 (8K), GRCh38_Hisat2_index.8.ht2 (1.1K), GRCh38_Hisat2_index.7.ht2 (5K) and GRCh38_Hisat2_index.0.rf (39G)

ADD REPLY
0
Entering edit mode

I don't think your index is complete - especially as you still have a temp file (Hisat2index.0.rf) and the file sizes are very small - in one case even 0!!! mine range from 1.8 G to 12 KB - and more importantly are very similar to the file sizes of the index files I downloaded from HISAT2. As there is no error message I cannot tell you what went wrong... But according to this protocol you need 160G for the whole human genome - so my guess is that this is the issue. So you need to set the RAM at least to 160G.

ADD REPLY

Login before adding your answer.

Traffic: 2495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6