Question

Rpkm Or Fpkm For Paired End Data Analyses Of Gene Expression?

1

Entering edit mode

12.8 years ago

Nebo ▴ 80

I have Illumina paired-end data from 4 libraries in sugarcane, and I've aligned using Sorghum as a reference and after that I normalized and got the RPKM values for gene expression, and then used DEGseq to call differentially expressed genes. I was told that FPKM values would work better for paired end data. Which one is more suitable, RPKM or FPKM? BTW, I DON'T want to find alternative splicing, I only want to want to find differentially expressed genes.

paired fpkm rpkm • 8.2k views

ADD COMMENT • link updated 10.2 years ago by Adrian Pelin ★ 2.6k • written 12.8 years ago by Nebo ▴ 80

0

Entering edit mode

Use neither.

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

score 4 · Answer 1 · 2011-07-14

4

Entering edit mode

12.8 years ago

Vitis ★ 2.5k

For differential expression, like DEGseq or edgeR, I usually use raw count number and let the software handle the normalization (or normalize through your own calculations in R). This way I have more controls over how the data get normalized. Just my own opinion, paired-end or single-end matters more when you are mapping than in counting and statistical analyses.

ADD COMMENT • link 12.8 years ago by Vitis ★ 2.5k

0

Entering edit mode

wouldnt I have bigger problems adding raw count number (uniquelly mapped reads) instead of normalized? I tried this and the results were not the same as when I used normalized data...

ADD REPLY • link 12.8 years ago by Nebo ▴ 80

0

Entering edit mode

I mean, you still need to normalize after inputing the raw counts, but you may do it in different ways, totally controlled by yourself. For example, you can use library sizes (mapped) and exon model lengths to calculate RPKM values by yourself. In theory, the numbers should be very similar to the ones you get from mapping programs. There are other normalization methods available in edgeR such as quantile-based or TMM, so you have different methods to play with. Then you may choose the ones giving you robust results.

ADD REPLY • link 12.8 years ago by Vitis ★ 2.5k

0

Entering edit mode

I agree that the normalization done with most DE packages is more robust than using a normalization by FPKM, which is essentially normalizing by total reads.

But when you plug your counts in you can use pairs aligned or reads aligned. For many of these packages (e.g. EdgeR, DESeq) there is an assumption that the counting noise will be modeled as a Poisson distribution, that is the variance will be equal to the mean. I do not think that is totally valid when you count reads rather than pairs because the sampling of reads is not independent if read 1 is (almost) always matched with read 2. In that case I believe the variance of the distribution of many samplings of the count of reads is higher than the count of reads. (I think it's twice the mean of the read count but I am not 100% sure). So I think for these packages that take raw counts you still need to consider whether you are counting reads or pairs. In general you usually should count pairs. Unless you are doing something like a t-test that just measures variance directly without any inferences. Then you are probably OK to use either.

ADD REPLY • link 10.2 years ago by Michele Busby ★ 2.2k

score 1 · Answer 2 · 2014-02-09

1

Entering edit mode

10.2 years ago

Adrian Pelin ★ 2.6k

FPKM, as calculated by Cufflinks. FPKM takes into account information with regards to paired end. RPKM does not, it treats all reads independently.

ADD COMMENT • link 10.2 years ago by Adrian Pelin ★ 2.6k