Merge TPM for transcripts to genes Kallisto
1
0
Entering edit mode
6.8 years ago

Hi,

I have an output file from Kallisto with RNA transcripts and their corresponding TPM:s from Kallisto, to enable comparison with previous results (mass spectrometry and FPKM values on gene level) I would like to merge all transcripts that belong to the same gene and just summarize the TPM:s for each gene. I have ran BiomaRt to generate a table with all the transcript id:s and corresponding gene ID:s (ensembl) and I now wonder what would be the fastest way to just sum all TPM:s that are linked to the same ENSG ID, please help!

TPM FPKM Kallisto RNA-Seq software-error • 6.2k views
ADD COMMENT
0
Entering edit mode

Thanks! No I haven't considered that, will check it out.

ADD REPLY
0
Entering edit mode

Do you know the quickest way to do this in R (I mean which function to use..)?

ADD REPLY
0
Entering edit mode

I've updated my answer with a simple R solution.

ADD REPLY
3
Entering edit mode
6.8 years ago

Have you considered Salmon? It works on a similar methodology, and can output gene, and transcript level counts. Specific to your question, your methodology seems right, if you sum the TPMs of ensembl transcripts a, b, and c, for associated ensembl gene x, then that will work.

Edit: R example.

foo <- data.frame(gene=c(rep("A",3),
                         rep("B",2),
                         rep("C",1),
                         rep("D",4)),
                  transcript=c(paste0("A", 1:3),
                               paste0("B", 1:2),
                               paste0("C", 1),
                               paste0("D", 1:4)))
doo <- data.frame(SampleA = sample(1:100, 10),
                  SampleB = sample(1:100, 10),
                  SampleC = sample(1:100, 10))
rownames(doo) <- foo$transcript

out <- lapply(unique(foo$gene),
              function(x) {
                tmp       <- foo[foo$gene == x,]
                tmp_count <- doo[match(tmp$transcript,
                                       rownames(doo)),]
                tmp_out   <- colSums(tmp_count)
                return(tmp_out)
              })

gene_counts <- matrix(unlist(out), 
                      ncol  = ncol(doo), 
                      byrow = T)
rownames(gene_counts) <- unique(foo$gene)
colnames(gene_counts) <- colnames(doo)
ADD COMMENT

Login before adding your answer.

Traffic: 1845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6