Question: HISAT2 index building
gravatar for nienke.besbrugge
12 months ago by
nienke.besbrugge0 wrote:

Hi there,

I am currently trying to index a huge genome (8.3 Gbp) and provided the exons and splice sites, as recommended in the HISAT2 manual. As you can imagine, running this has been taking up a lot of memory, but after a long time the code is still running, and it says it is at its 7th generation. My question is: how many generations does the index builder normally go through (are we almost there, or is it time to abort the attempt of building the index?)

Would it be faster/more convenient to try to build the index without providing the exon and splice site data, and how relevant would that index still be for downstream transcriptomics analysis?

Thanks for any clarification, Nienke

hisat2 • 886 views
ADD COMMENTlink written 12 months ago by nienke.besbrugge0

Index with --ss and --exon options on large genomes (e.g. human, mouse, zebrafish etc.) only if you have more than 200 GB RAM. If not index simply like this

hisat2-build -p 10 genome.fa genome

You can provide the exon information at the time of alignment like this

hisat2 --known-splicesite-infile Gene_splicesites.txt -x genome -1 Sample_R1.fastq -2 Sample1_R2.fastq -S Sample.sam


ADD REPLYlink modified 12 months ago • written 12 months ago by Satyajeet Khare1.6k

I am not an expert but I did get an opinion from a person who provide core services at the NIH, that you need to be have the transcriptome GTF for building the index. Do it without the exon information and it will be fine. You will be using the annotation file while quantifying.

I am doing exactly that and my analysis doesn't look bad. However, I did realize later on that using exon information in index is crucial if your focus is on splicing isoforms. For a normal DEG analysis I wouldn't worry.

Also you can use pre-built index from the HISAT2 webpage, however, then you have to have the GTF file for that version in the quantification step. For example, I was using pre-built index that was using version 86 GTF, but it gave errors when I used gencode v92 GTF.

ADD REPLYlink modified 12 months ago • written 12 months ago by piyushjo530
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1331 users visited in the last hour