Question

Calculating Rpkm From Rsem Using Tcga Rnaseqv2 Level3 Data

5

Entering edit mode

10.7 years ago

J.F.Jiang ▴ 910

Hi all,

I want to use the TCGA RNAseqV2 RSEM data to calculate the RPKM value for each gene.

I suppose to do with: RSEM/all(RSEM) *1million

Is there anyone know the right method to calculate the RPKM?

tcga rpkm • 14k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by J.F.Jiang ▴ 910

Ram · Answer 1 · 2013-07-30

7

Entering edit mode

10.7 years ago

Chris Cabanski ▴ 330

As far as I know, there is no way to go from RSEM to RPKM. Is there a specific reason you prefer RPKM over RSEM? RSEM should give expression estimates that are just as good or better than RPKM. From the RSEM paper:

The second measure of abundance is the estimated fraction of transcripts made up by a given isoform or gene. This measure can be used directly as a value between zero and one or can be multiplied by 10^6 to obtain a measure in terms of transcripts per million (TPM). The transcript fraction measure is preferred over the popular RPKM [18] and FPKM [6] measures because it is independent of the mean expressed transcript length and is thus more comparable across samples and species [7].

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by Chris Cabanski ▴ 330

0

Entering edit mode

Thanks for sharing.

The actual situation is when we look for the differential expressed genes, whether considering the gene length or not will not influence the result.

If we calculate the pearson correlation, the gene length will more or less bring the bias into the results. Even we know the spearman correlation should be much better, considering the gene length in expression values should be more solid when we are going to compare the genes across samples or species.

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

RSEM already corrects for transcript length, so no additional correction should be necessary.

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by Chris Cabanski ▴ 330

0

Entering edit mode

I don't think so, if you can offer the reference that will be of great help

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

http://www.biomedcentral.com/1471-2105/12/323

They normalize by the "effective length" which is somewhat different from transcript length - this is described in their methods section. It may be useful to plot the RSEM values against the transcript length to see if the two are independent as claimed.

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.7 years ago by Chris Cabanski ▴ 330

1

Entering edit mode

Do I need change the RSEM value into log2(RSEM) to calculated the differentially expressed genes with t.test?

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.5 years ago by Yamol ▴ 40

0

Entering edit mode

I haven't tried using RSEM values when finding DE genes, so I can't comment whether values should be log transformed. (FPKM values typically look more Gaussian when log transforming, so my gut reaction would be to log transform, but this should be tested).

The authors mention that their output includes 2 values: (1) an estimate of the number of fragments that are derived from a given isoform or gene (similar to read counts) and (2) estimated fraction of transcripts made up by a given isoform or gene. They mention that the first value can be used as input to edgeR or DEseq to determine DE genes. If both of these values are available from TCGA, I would suggest using one of these methods over a simple ttest.

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 10.5 years ago by Chris Cabanski ▴ 330