How to assign gene names after kallisto when I add GFP?
1
0
Entering edit mode
4 months ago
bioinfo ▴ 140

Hello,

I would like to generate a new reference for kallisto where I add GFP. I found this link: https://github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md and it seems pretty straightforward to add the GFP for the alignment.

However, I am not sure how to proceed with assigning gene IDs. Normally, I use tximport and biomart to get the ensembl version I am using to assign gene IDs. However, that would not have the GFP. Is it ok to still do that and then at the end of my table with the counts just take the values from the raw kallisto output for GFP and add them there? Would the fact that the kallisto output has GFP but the ensembl object doesn't cause any issues with the gene name assignment?

Thank you

tximport RNA-seq kallisto • 588 views
ADD COMMENT
1
Entering edit mode
4 months ago
dsull ★ 5.5k

kallisto relies on a transcriptome index, so simply add the GFP sequence to the transcriptome FASTA file and index it. The name in the output abundance.tsv file will contain whatever name you gave that GFP sequence.

If you need to use GTF files for any downstream analysis, you can also add GFP to that GTF file, following the format of any other gene in that file.

ADD COMMENT
0
Entering edit mode

Thank you. I did this and I noticed that some of the gene counts are the same with the new reference. However, some of them seem to be different with the reference where I just added GFP. The pattern still seems to be the same though. For example these are the counts for gene A for the different samples:

enter image description here

What could be causing this difference? I know that kallisto is probabilistic so there is a chance of getting different counts when rerunning but I did not expect such a big difference.

Thank you

ADD REPLY
0
Entering edit mode

It seems to be different by a factor of 2 so maybe you're double-counting in some way (maybe you duplicated transcript entries in your FASTA/GTF). How are you obtaining the gene counts? Does the abundance.tsv files outputted by kallisto have the same differences? What if you plot the gene counts against each other (GFP vs. no GFP)?

ADD REPLY
0
Entering edit mode

Thank you so much for your help. I figured it out.

ADD REPLY

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6