Question: What is the reason why we usually use normalized values from RNA-Seq (FPKM, RPKM, etc.) ?
1
gravatar for ebrudermanver
22 months ago by
ebrudermanver50 wrote:

I don't have much experience with RNA-Seq but I am seeing that the data is usually published not in raw counts but in FPKM values. What is the reason for that? Is it only because so that we can model the values by a log-Gaussian distribution rather than a discrete distribution like Poisson or negative binomial? Or does it have any purpose to make data more accurate and reliable?

rna-seq • 1.3k views
ADD COMMENTlink modified 22 months ago by Michael Dondrup46k • written 22 months ago by ebrudermanver50
3
gravatar for Michael Dondrup
22 months ago by
Bergen, Norway
Michael Dondrup46k wrote:

The reason for FPKM is mostly historical as there are practically only disadvantages in distributing the data this way.

  • There are several posts and publications showing that FPKM is inferior to other units.
  • FPKM is not directly compatible with most DE packages.
  • Providing raw counts would instead allow anyone to compute the transformation they wanted (CPM, TPM, FPKM), while the FPKM transformation is not easily reversible.
  • FPKM manifests biases and errors in the gene prediction, especially it is not suitable for draft genomes where the exons are often not well annotated.
  • FPKM need to be represented as floating point values, introducing unnecessary rounding errors and maybe data volume, while the counts can be represented by integers.
ADD COMMENTlink modified 22 months ago • written 22 months ago by Michael Dondrup46k
2
gravatar for grant.hovhannisyan
22 months ago by
grant.hovhannisyan1.6k wrote:

R(F)PKM/TPM values are used to normalize read counts by library size (total number of reads you have in a given RNAseq experiment) and the length of the feature (gene/transcript). But remember that commonly used software for differential expression analysis (DESEQ2/EdgeR) are using raw counts instead of normalized values (they do their internal normalization steps).

ADD COMMENTlink written 22 months ago by grant.hovhannisyan1.6k
0
gravatar for ebrudermanver
22 months ago by
ebrudermanver50 wrote:

Okay, I just found that link which says that FPKM makes it possible to compare Gene A to Gene B even if they are of different lengths, and to compare Sample 1 and Sample 2 even if they have different library sizes.

ADD COMMENTlink modified 22 months ago • written 22 months ago by ebrudermanver50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1573 users visited in the last hour