retrieve TPM matrix from raw counts matrix or DESeq2 object dds
2
1
Entering edit mode
2.2 years ago
StartR ▴ 20

Hi

Is it possible to retrieve TPM from raw counts or DESeq2 object dds?

I have downloaded the data from TCGA using ""TCGAbiolinks" package in R, although I have download the whole data set, I am making it simpler here, to download for a few samples, so that it will be easier for you to see this data and get a grip:

library(TCGAbiolinks)

listSamples <- c("TCGA-E9-A1NG-11A-52R-A14M-07","TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-A7-A13G-11A-51R-A13Q-07","TCGA-BH-A0DK-11A-13R-A089-07", "TCGA-E9-A1RH-11A-34R-A169-07","TCGA-BH-A0AU-01A-11R-A12P-07", "TCGA-C8-A1HJ-01A-11R-A13Q-07","TCGA-A7-A13D-01A-13R-A12P-07", "TCGA-A2-A0CV-01A-31R-A115-07","TCGA-AQ-A0Y5-01A-11R-A14M-07")

query <- GDCquery(project = "TCGA-BRCA", data.category = "Gene expression", data.type = "Gene expression quantification", experimental.strategy = "RNA-Seq", platform = "Illumina HiSeq", file.type = "results", barcode = listSamples, legacy = TRUE)



If you will run these commands, you will see the files are created in dir "BRCA_test", each folder contains files with extension *.rsem.genes.results

The file contains gene_id, raw_counts, scaled_estimates, transcript_id.

I am also able to get a dds object from DESeq2 package for this data using:

dds <- DESeqDataSetFromMatrix(countData = BRCA_mat, colData = sampleData, design = ~ condition)


I used "raw_counts" to generate "BRCA_mat"

Now my question is from raw count matrix, can I get TPM matrix?

I am aware that I will need featureLength, meanFragmentLength to calculate TPM - but given the data I get from TCGA, I do not have this data on length.

So is it possible to get TPM or even FPKM matrix from raw count martix?

Even if I will get FPKM, I will convert it to TPM.

And even if i do this:

dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)


Will these normalized counts be equivalent to TPM?

I think not, This is just dividing each column of

counts(dds) by sizeFactors(dds)

but it will not normalize for gene length,

TPM raw counts DESeq2 TCGA BRCA • 2.2k views
0
Entering edit mode
2.2 years ago

I and colleague Asaf gave an answer here: C: error in TXIMPORT command for RSEM

I see - thanks. You can calculate TPM from the BRCA_mat object; however, you will still require the gene lengths. The TPM calculation is elaborated ere: https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained.

DESeq2 does not produce TPM data. However, you could easily use DESeq2 to produce variance-stabilised or regularised log data, which are 'better' than TPM, in my opinion.

0
Entering edit mode
2.2 years ago
igor 12k

So is it possible to get TPM or even FPKM matrix from raw count martix?

DESeq2 has the fpkm() function specifically for that.

If you just want TPMs and keep things simple, you could also just download them from Xena.