Question: Calculating Rpkm From Rsem Using Tcga Rnaseqv2 Level3 Data
4
gravatar for J.F.Jiang
4.5 years ago by
J.F.Jiang650
China
J.F.Jiang650 wrote:

Hi all,

I want to use the TCGA RNAseqV2 RSEM data to calculate the RPKM value for each gene.

I suppose to do with: RSEM/all(RSEM) *1million

Is there anyone know the right method to calculate the RPKM?

rpkm tcga • 9.1k views
ADD COMMENTlink modified 4.5 years ago by Chris Cabanski320 • written 4.5 years ago by J.F.Jiang650
7
gravatar for Chris Cabanski
4.5 years ago by
Chris Cabanski320 wrote:

As far as I know, there is no way to go from RSEM to RPKM. Is there a specific reason you prefer RPKM over RSEM? RSEM should give expression estimates that are just as good or better than RPKM. From the RSEM paper:

The second measure of abundance is the estimated fraction of transcripts made up by a given isoform or gene. This measure can be used directly as a value between zero and one or can be multiplied by 10^6 to obtain a measure in terms of transcripts per million (TPM). The transcript fraction measure is preferred over the popular RPKM [18] and FPKM [6] measures because it is independent of the mean expressed transcript length and is thus more comparable across samples and species [7].

ADD COMMENTlink written 4.5 years ago by Chris Cabanski320

Thanks for sharing. The actual situation is when we look for the differential expressed genes, whether considering the gene length or not will not influence the result. If we calculate the pearson correlation, the gene length will more or less bring the bias into the results. Even we know the spearman correlation should be much better, considering the gene length in expression values should be more solid when we are going to compare the genes across samples or species.

ADD REPLYlink written 4.5 years ago by J.F.Jiang650

RSEM already corrects for transcript length, so no additional correction should be necessary.

ADD REPLYlink written 4.5 years ago by Chris Cabanski320

I don't think so, if you can offer the reference that will be of great help

ADD REPLYlink written 4.5 years ago by J.F.Jiang650

http://www.biomedcentral.com/1471-2105/12/323 They normalize by the "effective length" which is somewhat different from transcript length - this is described in their methods section. It may be useful to plot the RSEM values against the transcript length to see if the two are independent as claimed.

ADD REPLYlink written 4.5 years ago by Chris Cabanski320
1

Do I need change the RSEM value into log2(RSEM) to calculated the differentially expressed genes with t.test?

ADD REPLYlink written 4.3 years ago by Yamol40

I haven't tried using RSEM values when finding DE genes, so I can't comment whether values should be log transformed. (FPKM values typically look more Gaussian when log transforming, so my gut reaction would be to log transform, but this should be tested).

The authors mention that their output includes 2 values: (1) an estimate of the number of fragments that are derived from a given isoform or gene (similar to read counts) and (2) estimated fraction of transcripts made up by a given isoform or gene. They mention that the first value can be used as input to edgeR or DEseq to determine DE genes. If both of these values are available from TCGA, I would suggest using one of these methods over a simple ttest.

ADD REPLYlink written 4.3 years ago by Chris Cabanski320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 609 users visited in the last hour