Question: How to add tdtomato to mouse genome for RNA seq analysis
Hello all!

I am currently analyzing some RNA seq data derived from mice that express the reporter, tdtomato, under an endogenous promoter. I'd like to get read counts for tdtomato, but obviously it isn't part of the mouse transcriptome, and therefore Salmon doesn't map reads to it.

From what I can figure out on my own, it seems this shouldn't be that hard. Just add the sequence for tdtomato to the mouse transcriptome that I supply to Salmon. And if I want gene level counts, add the tdtomato accession number to my tab delimited transcipt to gene mapping file. Anything else to be done or to look out for? Thanks! Ed

That is pretty much about it. Just check afterwards if tdtomato is appearing in your counts files.

I wonder if you have handled to do it. I am also interested in counting tdtomato expression in mouse (sc-RNAseq). Thanks!

I had exactly the same question. Did you get it to work?

Looks like the nucleotide sequence may be here:

 1 atggtgagca agggcgagga ggtcatcaaa gagttcatgc gcttcaaggt gcgcatggag
   61 ggctccatga acggccacga gttcgagatc gagggcgagg gcgagggccg cccctacgag
  121 ggcacccaga ccgccaagct gaaggtgacc aagggcggcc ccctgccctt cgcctgggac
  181 atcctgtccc cccagttcat gtacggctcc aaggcgtacg tgaagcaccc cgccgacatc
  241 cccgattaca agaagctgtc cttccccgag ggcttcaagt gggagcgcgt gatgaacttc
  301 gaggacggcg gtctggtgac cgtgacccag gactcctccc tgcaggacgg cacgctgatc
  361 tacaaggtga agatgcgcgg caccaacttc ccccccgacg gccccgtaat gcagaagaag
  421 accatgggct gggaggcctc caccgagcgc ctgtaccccc gcgacggcgt gctgaagggc
  481 gagatccacc aggccctgaa gctgaaggac ggcggccact acctggtgga gttcaagacc
  541 atctacatgg ccaagaagcc cgtgcaactg cccggctact actacgtgga caccaagctg
  601 gacatcacct cccacaacga ggactacacc atcgtggaac agtacgagcg ctccgagggc
  661 cgccaccacc tgttcctggg gcatggcacc ggcagcaccg gcagcggcag ctccggcacc
  721 gcctcctccg aggacaacaa catggccgtc atcaaagagt tcatgcgctt caaggtgcgc
  781 atggagggct ccatgaacgg ccacgagttc gagatcgagg gcgagggcga gggccgcccc
  841 tacgagggca cccagaccgc caagctgaag gtgaccaagg gcggccccct gcccttcgcc
  901 tgggacatcc tgtcccccca gttcatgtac ggctccaagg cgtacgtgaa gcaccccgcc
  961 gacatccccg attacaagaa gctgtccttc cccgagggct tcaagtggga gcgcgtgatg
 1021 aacttcgagg acggcggtct ggtgaccgtg acccaggact cctccctgca ggacggcacg
 1081 ctgatctaca aggtgaagat gcgcggcacc aacttccccc ccgacggccc cgtaatgcag
 1141 aagaagacca tgggctggga ggcctccacc gagcgcctgt acccccgcga cggcgtgctg
 1201 aagggcgaga tccaccaggc cctgaagctg aaggacggcg gccactacct ggtggagttc
 1261 aagaccatct acatggccaa gaagcccgtg caactgcccg gctactacta cgtggacacc
 1321 aagctggaca tcacctccca caacgaggac tacaccatcg tggaacagta cgagcgctcc
 1381 gagggccgcc accacctgtt cctgtacggc atggacgagc tgtacaagta a
Finding the correct sequence is actually harder than one would assume. Check this previous discussion: Why are there multiple GFP nucleotide sequences?

And steps for adding GFP (or any exogenous sequence) to a genome:

