Question

Converting CPM to FPKM or TMM?

0

Entering edit mode

4.3 years ago

klorilla • 0

Hi, I'm new to RNA-seq and was assigned this differential expression project. I used edgeR to calculate my counts to CPM, but my advisor wants me to convert this to FPKM.

I'm also reading that it's probably preferred to use TMM for normalization, but I'm not entirely sure what the differences are, if there's upsides/downsides to one or the other, or if this is the proper thing to do (like is CPM sufficient for differential expression analysis)?

RNA-Seq • 6.1k views

ADD COMMENT • link updated 4.3 years ago by ATpoint 82k • written 4.3 years ago by klorilla • 0

score 1 · Answer 1 · 2020-01-29

If you used edgeR following the manual, so first calculating size factors and then use cpm() then you already have the normalized counts using the TMM method since this is the default. Essentially this is the raw counts normalized per million reads and then further corrected for library composition with a size factor. FPKM and its derivates also take the gene size into account since longer genes inherently produce more counts than shorter genes at equal expression level given that you have normal full-length RNA-seq. For differential expression one commonly does not do length normalization as it would only reduce the counts of longer genes and by this reduce statistical power. If you really need FPKM for whatever purpose I would use the rpkm() function in edgeR. Still, this would only be for downstream purposes such as clustering or heatmaps but is not used in differential expression.

For differential expression follow the manual unless you have expert knowledge to justify any changes from it. It starts from raw counts. Check the manual. If your advisor tells you otherwise you should redirect him to the manual and edgeR papers. Do not let anyone force yourself into using FPKM or other normalized values for differential expression even though you find plenty of papers which did that. Please also see other threads here and at the Bioconductor support forum that address this to get some background. As said, stick exactly to the manual unless you have expert knowledge to change things.

score 0 · Answer 2 · 2020-01-28

There is a really great answer on Bioconductor support forum already.

FPKM = fragments per kilobase/million. To compute this, you divide the count by the exonic length of the gene (in kilobases) and the library size (in millions of reads). This can be done using the rpkm function. ... However, calculation of the FPKM is distinct from edgeR's normalization. In edgeR, the TMM method computes normalization factors that represent sample-specific biases. ... That said, if you wanted to compute FPKM values that incorporate information from TMM normalization, you would use the effective library size instead of the library size in the FPKM calculation.