Question: Merge TPM for transcripts to genes Kallisto
0
gravatar for frida.danielsson
3.4 years ago by
European Union
frida.danielsson40 wrote:

Hi,

 

I have an output file from Kallisto with RNA transcripts and their corresponding TPM:s from Kallisto, to enable comparison with previous results (mass spectrometry and FPKM values on gene level) I would like to merge all transcripts that belong to the same gene and just summarize the TPM:s for each gene. I have ran BiomaRt to generate a table with all the transcript id:s and corresponding gene ID:s (ensembl) and I now wonder what would be the fastest way to just sum all TPM:s that are linked to the same ENSG ID, please help!

 

 

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by frida.danielsson40
1
gravatar for andrew.j.skelton73
3.4 years ago by
London
andrew.j.skelton735.6k wrote:

Have you considered Salmon? It works on a similar methodology, and can output gene, and transcript level counts. Specific to your question, your methodology seems right, if you sum the TPMs of ensembl transcripts a, b, and c, for associated ensembl gene x, then that will work. 

Edit: R example.

foo <- data.frame(gene=c(rep("A",3),
                         rep("B",2),
                         rep("C",1),
                         rep("D",4)),
                  transcript=c(paste0("A", 1:3),
                               paste0("B", 1:2),
                               paste0("C", 1),
                               paste0("D", 1:4)))
doo <- data.frame(SampleA = sample(1:100, 10),
                  SampleB = sample(1:100, 10),
                  SampleC = sample(1:100, 10))
rownames(doo) <- foo$transcript

out <- lapply(unique(foo$gene),
              function(x) {
                tmp       <- foo[foo$gene == x,]
                tmp_count <- doo[match(tmp$transcript,
                                       rownames(doo)),]
                tmp_out   <- colSums(tmp_count)
                return(tmp_out)
              })

gene_counts <- matrix(unlist(out), 
                      ncol  = ncol(doo), 
                      byrow = T)
rownames(gene_counts) <- unique(foo$gene)
colnames(gene_counts) <- colnames(doo)
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by andrew.j.skelton735.6k
0
gravatar for frida.danielsson
3.4 years ago by
European Union
frida.danielsson40 wrote:

Thanks! No I haven't  considered that, will check it out. 

 

ADD COMMENTlink written 3.4 years ago by frida.danielsson40
0
gravatar for frida.danielsson
3.4 years ago by
European Union
frida.danielsson40 wrote:

Do you know the quickest way to do this in R (I mean which function to use..) ?

ADD COMMENTlink written 3.4 years ago by frida.danielsson40

I've updated my answer with a simple R solution.

ADD REPLYlink written 3.4 years ago by andrew.j.skelton735.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 870 users visited in the last hour