Question

To make difference or Ratio, it's a problem

0

Entering edit mode

6.1 years ago

Yijun Tian ▴ 20

Hello everyone,

I am trying to give evidence for my regulation of gene A upon gene B in cDNA Chip (i.e. hgu133a) and RNA-seq (i.e. TCGA-RNA Hiseq) dataset. I already found a high correlation between the two gene's mRNA(say coeffient R for log2 transformed expresssion is at least 0.5 in all above dataset). My assumption is if A activated, then A/B ratio will be smaller across all samples within each dataset. Now the question is I did a ratio for chip data( probeset intensity) and it worked well for survival prediction, but for RPKM data, I only found the direct Δ between A and B readcounts predict well. So do I have reason to use Δ instead of ratio for RPKM data? Does anyone have relevant reference to recommand?

Thank you!

RPKM Probeset rna-seq • 1.8k views

ADD COMMENT • link updated 5.6 years ago by Kevin Blighe 87k • written 6.1 years ago by Yijun Tian ▴ 20

score 1 · Answer 1 · 2018-09-06

1

Entering edit mode

5.6 years ago

Kevin Blighe 87k

Difficult to answer. All that I know is that RPKM data ~~is ideal~~ is not ideal - the normalisation method that produces RPKM expression values was one of the first forms of normalisation developed for RNA-seq but it has since been shown to be ineffective for cross-sample comparisons. Some have even questioned within-sample comparisons. With your HTseq counts, I would re-process these using DEseq2, EdgeR, or limma/voom.

Kevin

ADD COMMENT • link 5.5 years ago by Kevin Blighe 87k

1

Entering edit mode

Well, I am thinking ratio might be a better way since normalizing RPKM or even TPM of gene A to gene B (both gene expression are obvious and the variations across samples are equal) may be able to give a more accurate evaluation of my prediction. Difference method seemed to be too crude in my question... I am using estimate count calculated from rsem-calculate-expression to calculate my ratio now and it worked well.

Thank you!

ADD REPLY • link 5.6 years ago by Yijun Tian ▴ 20

0

Entering edit mode

Did you mean 'not ideal'?

ADD REPLY • link 5.6 years ago by russhh 5.7k

0

Entering edit mode

lol - yes, you already know I am somewhat against RPKM. Will modify.

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

1

Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.5 years ago by Kevin Blighe 87k