Question

FPKM not suitable for DE?

5

Entering edit mode

9.9 years ago

nevev ▴ 110

Hello,

I have been recently pointed to a lecture by Lior Pachter:

where he states that FPKM normalization method is not completely correct or at least not suitable for differential expression analysis.

Could somebody please elaborate on that? In my study, I would like to compare different experimental conditions (pairwise), all with replicates. Is FPKM inappropriate in such design?

Also, if I use Galaxy, how and at which step can I deal with it? How can I transform my FPKM values to TPMs e.g.?

Looking forward to learn,

Regards,
Monika

normalization FPKM RNA-Seq • 5.5k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.9 years ago by nevev ▴ 110

3

Entering edit mode

The part on FPKM of the video starts at about 30m. Unfortunately, a youtube video of a lecture is not a citable peer-reviewed article, but there is a review in Brief. in Bioinformatics, I remember.

See also Does FPKM scale incorrectly in case of unequal mapping rates?

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.9 years ago by Michael 54k

2

Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

Ram · Answer 1 · 2014-06-19

I think RPKM/FPKM is fine. I think you could also make arguments against the assumptions in the TPM calculations. I've found working with transformed RPKM/FPKM values yields robust results, and can see some benchmarks in this post as well as the associated paper.

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

I would also agree with the comments from the think to the other Biostar discussion. Plus, you might notice that cufflinks still doesn't actually provide TPM values, so that might be one indication that it is not such a big deal ;)

That said, study design does matter. For example, you will probably find some systematic differences between independent datasets, especially if different sample preparation protocols were used. However, I don't think those will be fixed by using TPM instead of RPKM/FPKM.

Also, rant aside, this blog post includes R code to convert between RPKM and TPM, and I think the end of the blog post does a nice job of showing the difference between the metrics

http://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/