Hello all!
I am currently analyzing some RNA seq data derived from mice that express the reporter, tdtomato, under an endogenous promoter. I'd like to get read counts for tdtomato, but obviously it isn't part of the mouse transcriptome, and therefore Salmon doesn't map reads to it.
From what I can figure out on my own, it seems this shouldn't be that hard. Just add the sequence for tdtomato to the mouse transcriptome that I supply to Salmon. And if I want gene level counts, add the tdtomato accession number to my tab delimited transcipt to gene mapping file. Anything else to be done or to look out for? Thanks! Ed
That is pretty much about it. Just check afterwards if tdtomato is appearing in your counts files.
I wonder if you have handled to do it. I am also interested in counting tdtomato expression in mouse (sc-RNAseq). Thanks!
I had exactly the same question. Did you get it to work?
Looks like the nucleotide sequence may be here: https://www.ncbi.nlm.nih.gov/nuccore/55420622
Finding the correct sequence is actually harder than one would assume. Check this previous discussion: Why are there multiple GFP nucleotide sequences?
And steps for adding GFP (or any exogenous sequence) to a genome: https://github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md