Question: retrieve TPM matrix from raw counts matrix or DESeq2 object dds
1
gravatar for StartR
9 months ago by
StartR20
Sweden
StartR20 wrote:

Hi

Is it possible to retrieve TPM from raw counts or DESeq2 object dds?

I have downloaded the data from TCGA using ""TCGAbiolinks" package in R, although I have download the whole data set, I am making it simpler here, to download for a few samples, so that it will be easier for you to see this data and get a grip:

library(TCGAbiolinks)

listSamples <- c("TCGA-E9-A1NG-11A-52R-A14M-07","TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-A7-A13G-11A-51R-A13Q-07","TCGA-BH-A0DK-11A-13R-A089-07", "TCGA-E9-A1RH-11A-34R-A169-07","TCGA-BH-A0AU-01A-11R-A12P-07", "TCGA-C8-A1HJ-01A-11R-A13Q-07","TCGA-A7-A13D-01A-13R-A12P-07", "TCGA-A2-A0CV-01A-31R-A115-07","TCGA-AQ-A0Y5-01A-11R-A14M-07")


query <- GDCquery(project = "TCGA-BRCA", data.category = "Gene expression", data.type = "Gene expression quantification", experimental.strategy = "RNA-Seq", platform = "Illumina HiSeq", file.type = "results", barcode = listSamples, legacy = TRUE)


GDCdownload(query = query, directory = 'BRCA_test', method = 'api')

If you will run these commands, you will see the files are created in dir "BRCA_test", each folder contains files with extension *.rsem.genes.results

The file contains gene_id, raw_counts, scaled_estimates, transcript_id.

I am also able to get a dds object from DESeq2 package for this data using:

dds <- DESeqDataSetFromMatrix(countData = BRCA_mat, colData = sampleData, design = ~ condition)

I used "raw_counts" to generate "BRCA_mat"

Now my question is from raw count matrix, can I get TPM matrix?

I am aware that I will need featureLength, meanFragmentLength to calculate TPM - but given the data I get from TCGA, I do not have this data on length.

So is it possible to get TPM or even FPKM matrix from raw count martix?

Even if I will get FPKM, I will convert it to TPM.

And even if i do this:

dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)

Will these normalized counts be equivalent to TPM?

I think not, This is just dividing each column of

counts(dds) by sizeFactors(dds)

but it will not normalize for gene length,

so I want TPM matrix from raw counts matrix. Please help. Thanks in advance.

tcga brca raw counts tpm deseq2 • 550 views
ADD COMMENTlink modified 9 months ago by igor12k • written 9 months ago by StartR20
0
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

I and colleague Asaf gave an answer here: C: error in TXIMPORT command for RSEM

I see - thanks. You can calculate TPM from the BRCA_mat object; however, you will still require the gene lengths. The TPM calculation is elaborated ere: https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained.

DESeq2 does not produce TPM data. However, you could easily use DESeq2 to produce variance-stabilised or regularised log data, which are 'better' than TPM, in my opinion.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Kevin Blighe70k
0
gravatar for igor
9 months ago by
igor12k
United States
igor12k wrote:

So is it possible to get TPM or even FPKM matrix from raw count martix?

DESeq2 has the fpkm() function specifically for that.

If you just want TPMs and keep things simple, you could also just download them from Xena.

ADD COMMENTlink modified 9 months ago • written 9 months ago by igor12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour
_