Calculating Rpkm From Rsem Using Tcga Rnaseqv2 Level3 Data
1
5
Entering edit mode
10.7 years ago
J.F.Jiang ▴ 910

Hi all,

I want to use the TCGA RNAseqV2 RSEM data to calculate the RPKM value for each gene.

I suppose to do with: RSEM/all(RSEM) *1million

Is there anyone know the right method to calculate the RPKM?

tcga rpkm • 14k views
ADD COMMENT
7
Entering edit mode
10.7 years ago

As far as I know, there is no way to go from RSEM to RPKM. Is there a specific reason you prefer RPKM over RSEM? RSEM should give expression estimates that are just as good or better than RPKM. From the RSEM paper:

The second measure of abundance is the estimated fraction of transcripts made up by a given isoform or gene. This measure can be used directly as a value between zero and one or can be multiplied by 10^6 to obtain a measure in terms of transcripts per million (TPM). The transcript fraction measure is preferred over the popular RPKM [18] and FPKM [6] measures because it is independent of the mean expressed transcript length and is thus more comparable across samples and species [7].

ADD COMMENT
0
Entering edit mode

Thanks for sharing.

The actual situation is when we look for the differential expressed genes, whether considering the gene length or not will not influence the result.

If we calculate the pearson correlation, the gene length will more or less bring the bias into the results. Even we know the spearman correlation should be much better, considering the gene length in expression values should be more solid when we are going to compare the genes across samples or species.

ADD REPLY
0
Entering edit mode

RSEM already corrects for transcript length, so no additional correction should be necessary.

ADD REPLY
0
Entering edit mode

I don't think so, if you can offer the reference that will be of great help

ADD REPLY
0
Entering edit mode

http://www.biomedcentral.com/1471-2105/12/323

They normalize by the "effective length" which is somewhat different from transcript length - this is described in their methods section. It may be useful to plot the RSEM values against the transcript length to see if the two are independent as claimed.

ADD REPLY
1
Entering edit mode

Do I need change the RSEM value into log2(RSEM) to calculated the differentially expressed genes with t.test?

ADD REPLY
0
Entering edit mode

I haven't tried using RSEM values when finding DE genes, so I can't comment whether values should be log transformed. (FPKM values typically look more Gaussian when log transforming, so my gut reaction would be to log transform, but this should be tested).

The authors mention that their output includes 2 values: (1) an estimate of the number of fragments that are derived from a given isoform or gene (similar to read counts) and (2) estimated fraction of transcripts made up by a given isoform or gene. They mention that the first value can be used as input to edgeR or DEseq to determine DE genes. If both of these values are available from TCGA, I would suggest using one of these methods over a simple ttest.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6