Question

How to perform gene annotation in kallisto?

0

Entering edit mode

3.9 years ago

synat.keam ▴ 120

Dear All,

I'm Synat, a cancer research student. I'm doing RNAseq experiment looking at differential expression between treatment conditions and looking through pathway analysis too.

I'm doing the alignment using kallisto.. I first built index from fasta file and then mapping it to reference gennome using using index and fastq file as described in kallisto manual. With these two stages, i got abundance.h5 file, abundance.tsv file and fun_info file.

However, as I have been told, this is not the end of alignemnt yet as I got only target ID in the abundance.tsv or abundance.h5. Hence, I need another step, which is gene annotation (assigning gene with specific name) and that may need another gtf file.

In kallisto manual, it has been mentioned briefly about gft file and chromosom file,

https://pachterlab.github.io/kallisto/starting

However, I am not sure whether that was about gene annotation or serve other purposes.

I got a sample R code from my colleages with folder of each sample containing abundance.tsv, abundance.h5 and run_info and once I run those codes they are all worked and got nice gene annotation/name and finally generated csv file for further analysis. However, once I run those code on my files (abundance.tsv, abundance.h5), gene names were missing.

My question is how could I performed annotation in kallisto using gtf file and generate abundance.tsv, abundance.h5 and run_info files? I am quite new to the field and Hope my question makes sense to everyone in the forum and look forward to hearing from you all.

Regards,

Gene kallisto annotation • 3.4k views

ADD COMMENT • link updated 22 months ago by 호성 • 0 • written 3.9 years ago by synat.keam ▴ 120

0

Entering edit mode

However, once I run those code on my files (abundance.tsv, abundance.h5), gene names were missing.

Did you check the assembly version you are using for Kallisto and gene annotation, I mean is it consistent?

If you are getting ENS (ensemble ids) in Kallisto then I guess using biomaRt you can easily annotate them (with appropriate assembly). If your aim is to only find out all the ensemble ids and associated gene symbols then you can parse the gtf columns (ensemble ids and gene symbols) OR just use the online version of biomart to fetch all the annotations.

ADD REPLY • link 3.9 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Running Kallisto

Dear Nitin,

Thank you so much for your time responding my question.. I do appreciated that.. I am wondering whether you have a quick moment looking through my code when I did the alignment.. I am not quite sure about gene assembly and it seems that I did not put any gene assembly in my code. Appreciate if you could have a look

I will also try BioMaRT as you suggest and let you know how i go with it.. Have a good weekend.

Regards,

synat

ADD REPLY • link 3.9 years ago by synat.keam ▴ 120

0

Entering edit mode

You are using mouse assembly version GRcM39 (mm39). Can you check this assembly version in your gene annotation script?

ADD REPLY • link 3.9 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Hi Nitin,

Thank for your response. Honestly, I have not done the gene annotation via BioMart in R yet as I am still reading the tutorial.

As you mentioned the online version of BioMart, did you mean that I can get the the fasta file with all the anotation rather the normal fasta file? You know the sequential step to get it? Really appreciated your help.

Kind Regards,

Synat,

ADD REPLY • link 3.9 years ago by synat.keam ▴ 120

0

Entering edit mode

I mean you can download ensemble gene id and associated gene symbols from BioMart. And I suppose your Kallisto abundance table will have ensemble ids (gene ids) and the abundance so in the next step, you can easily match these ids and fetch the associated gene symbols from BioMart downloaded table.

Anyway, why do you want this information, I mean you can do pretty much everything with the ensemble ids, annotation or pathway, or go enrichment.

ADD REPLY • link 3.9 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Hi Nitin,

Thank you so much for your assistance. I am looking for differential expression and pathway analysis too. a bit of exploratory analysis looking whether anything come up. so will explore a variety of things there.

Also, I had just fixed the issue as I just sourced fasta file from genecode and aligned with kallisto and got gene symbol/name there. so all good now. really appreciated your response. Hope you have a good day!

Kind Regards,

synat

ADD REPLY • link 3.9 years ago by synat.keam ▴ 120

0

Entering edit mode

Dear synat I have same problems too... Is it mean you created .idx file from .fasta file that include gene symbol/name? The thing i want to know is the file that you use to indexing.

It's too late but i need help thanks Lim

ADD REPLY • link 22 months ago by 호성 • 0