Hello,
I would like to generate a new reference for kallisto where I add GFP. I found this link: https://github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md and it seems pretty straightforward to add the GFP for the alignment.
However, I am not sure how to proceed with assigning gene IDs. Normally, I use tximport and biomart to get the ensembl version I am using to assign gene IDs. However, that would not have the GFP. Is it ok to still do that and then at the end of my table with the counts just take the values from the raw kallisto output for GFP and add them there? Would the fact that the kallisto output has GFP but the ensembl object doesn't cause any issues with the gene name assignment?
Thank you
Thank you. I did this and I noticed that some of the gene counts are the same with the new reference. However, some of them seem to be different with the reference where I just added GFP. The pattern still seems to be the same though. For example these are the counts for gene A for the different samples:
What could be causing this difference? I know that kallisto is probabilistic so there is a chance of getting different counts when rerunning but I did not expect such a big difference.
Thank you
It seems to be different by a factor of 2 so maybe you're double-counting in some way (maybe you duplicated transcript entries in your FASTA/GTF). How are you obtaining the gene counts? Does the abundance.tsv files outputted by kallisto have the same differences? What if you plot the gene counts against each other (GFP vs. no GFP)?
Thank you so much for your help. I figured it out.