Question: Can FPKM be added?
gravatar for Ekarl2
5.1 years ago by
Ekarl2110 wrote:


I have an de novo RNA-seq dataset that I have run RSEM on the gene-level and gotten out FPKM for all Trinity genes in the assembly. Unfortunately, the assembler split up a real gene (say, a transporter) into two separate Trinity genes in the assembly, so the first Trinity gene contains the first half of the sequences and the second Trinity gene contain the second half. Kind of like this:


|--------------------|   |------------------------|


I have the FPKM values for both of these Trinity genes separate. How can I calculate the FPKM for both of these Trinity genes if I would like to get the combined FPKM for both of them? Can FPKM simply be added together so FPKM(tot) = FPKM(Trinity gene 1) + FPKM(Trinity gene 2)? Or does it require a more complicated procedure?

I ask since I am not sure if it matters in what order length and library size normalization and combination of expression data from two Trinity genes are done. Are these operations commutative? If not, is it better to first add the raw expression values from the two Trinity genes and then normalize?


rna-seq fpkm • 1.8k views
ADD COMMENTlink modified 5.1 years ago by michael.ante3.6k • written 5.1 years ago by Ekarl2110

ADD REPLYlink written 5.1 years ago by Michael Dondrup48k
gravatar for michael.ante
5.1 years ago by
michael.ante3.6k wrote:

Let's say you have 1M reads in total, and two assemblies (1kb and 2kb). Each assembly has 100 reads:

FPKM1  100/(1*1) = 100
FPKM2  100/(2*1) = 50

The sum would be 150.

Merging the to assemblies to a 3 kb (plus a little gap in between):

FPKM3 200/(3*1) = 66.67

You need to re-calculate your expression values. If you have a couple of these wrongly separated assemblies, you might use Bedtools cluster to combine all assemblies with a certain distance.


ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.1 years ago by michael.ante3.6k

This was very helpful! Does the information you mentioned above also apply to cases were I would like to combine FPKM for two separate paralogs too where both are full length (say 2000 nt)? In those cases, I do not see an intuitive way to calculate a new combined length. Would I just divide the combined number of reads by 2*1 (assuming 1M reads in total assembly) instead of 4*1 then?

ADD REPLYlink written 5.1 years ago by Ekarl2110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour