Question

HISAT2 index generation

0

Entering edit mode

5.5 years ago

deshpande.neha2 • 0

I made a fasta file combining the genome annotations from 2 organisms(S.cerevisiae and S.Pombe). I want to use this file as a reference to align my RNAseq reads. How do I generate HISAT2 indexes for this file? I read the hisat2 manual and looked at a few blogs online but nothing seems to work. Where do I generate indexes? Is it possible to do it on a computer cluster like ada?

RNA-Seq alignment sequencing • 7.6k views

ADD COMMENT • link 5.5 years ago by deshpande.neha2 • 0

0

Entering edit mode

@ genomax We used S.Pombe as a spike in control while making library preps for S.Cerevisiae samples. I would need a composite index file to align my reads. As I wrote before, I have read the manual multiple times and find the instructions quite vague and non specific. That being said, I'm still a novice trying to learn things as I go along.

ADD REPLY • link 5.5 years ago by deshpande.neha2 • 0

2

Entering edit mode

@neha: Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

It is a rather odd choice of using S. pombe as a spike-in since those two yeasts are relatively similar. You are likely going to have the problem of many reads multi-mapping (mapping to both genomes). For RNAseq data such reads are not counted by default.

To build the genome.fa file you could concatenate chromosome sequence of both yeast (make sure the fasta headers contain something to distinguish S. pombe from S. cerevisiae, e.g. both can't have chr1 in header, make them, chr1_pombe and chr1_cere, you get the idea).

cat chr1_pombe chr2_pombe ... chrN_pombe chr1_cere chr2_cere ... chrN_cere > genome.fa

You can then use the command below to create the genome index

hisat2-build genome.fa cere_pombe

Then you would use the cere_pombe name in your alignments.

ADD REPLY • link 5.5 years ago by GenoMax 141k

0

Entering edit mode

Could you explain in greater detail how you used a S. pombe spike-in control? Do you know the exact transcript composition and abundancies of the S. pombe spike-in? Did you add this spike-in to all your S. cerevisiae samples?

ADD REPLY • link 5.5 years ago by h.mon 35k

score 4 · Answer 1 · 2018-10-12

Building HISAT2 index should be simple as hisat2-build genome.fa index_name. You can find detailed options (if you need them) on the manual page.

That said why are you building a composite index of two species? Does your sample have both genomes in it? Are you looking to separate the reads for the two?