Question: HISAT2 index generation
gravatar for deshpande.neha2
11 months ago by
deshpande.neha20 wrote:

I made a fasta file combining the genome annotations from 2 organisms(S.cerevisiae and S.Pombe). I want to use this file as a reference to align my RNAseq reads. How do I generate HISAT2 indexes for this file? I read the hisat2 manual and looked at a few blogs online but nothing seems to work. Where do I generate indexes? Is it possible to do it on a computer cluster like ada?

sequencing rna-seq alignment • 984 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by deshpande.neha20

@ genomax We used S.Pombe as a spike in control while making library preps for S.Cerevisiae samples. I would need a composite index file to align my reads. As I wrote before, I have read the manual multiple times and find the instructions quite vague and non specific. That being said, I'm still a novice trying to learn things as I go along.

ADD REPLYlink written 11 months ago by deshpande.neha20

@neha: Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

It is a rather odd choice of using S. pombe as a spike-in since those two yeasts are relatively similar. You are likely going to have the problem of many reads multi-mapping (mapping to both genomes). For RNAseq data such reads are not counted by default.

To build the genome.fa file you could concatenate chromosome sequence of both yeast (make sure the fasta headers contain something to distinguish S. pombe from S. cerevisiae, e.g. both can't have chr1 in header, make them, chr1_pombe and chr1_cere, you get the idea).

cat chr1_pombe chr2_pombe ... chrN_pombe chr1_cere chr2_cere ... chrN_cere > genome.fa

You can then use the command below to create the genome index

hisat2-build genome.fa cere_pombe

Then you would use the cere_pombe name in your alignments.

ADD REPLYlink written 11 months ago by genomax71k

Could you explain in greater detail how you used a S. pombe spike-in control? Do you know the exact transcript composition and abundancies of the S. pombe spike-in? Did you add this spike-in to all your S. cerevisiae samples?

ADD REPLYlink written 11 months ago by h.mon27k
gravatar for genomax
11 months ago by
United States
genomax71k wrote:

Building HISAT2 index should be simple as hisat2-build genome.fa index_name. You can find detailed options (if you need them) on the manual page.

That said why are you building a composite index of two species? Does your sample have both genomes in it? Are you looking to separate the reads for the two?

ADD COMMENTlink modified 11 months ago • written 11 months ago by genomax71k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1965 users visited in the last hour