1
3.4 years ago by
Ekarl290
Ekarl290 wrote:

Hi,

I have an de novo RNA-seq dataset that I have run RSEM on the gene-level and gotten out FPKM for all Trinity genes in the assembly. Unfortunately, the assembler split up a real gene (say, a transporter) into two separate Trinity genes in the assembly, so the first Trinity gene contains the first half of the sequences and the second Trinity gene contain the second half. Kind of like this:

-------------------------------------------------

|--------------------|   |------------------------|

I have the FPKM values for both of these Trinity genes separate. How can I calculate the FPKM for both of these Trinity genes if I would like to get the combined FPKM for both of them? Can FPKM simply be added together so FPKM(tot) = FPKM(Trinity gene 1) + FPKM(Trinity gene 2)? Or does it require a more complicated procedure?

I ask since I am not sure if it matters in what order length and library size normalization and combination of expression data from two Trinity genes are done. Are these operations commutative? If not, is it better to first add the raw expression values from the two Trinity genes and then normalize?

rna-seq fpkm • 1.3k views
modified 3.4 years ago by michael.ante3.2k • written 3.4 years ago by Ekarl290
2

9
3.4 years ago by
michael.ante3.2k
Austria/Vienna
michael.ante3.2k wrote:

Let's say you have 1M reads in total, and two assemblies (1kb and 2kb). Each assembly has 100 reads:

FPKM1  100/(1*1) = 100
FPKM2  100/(2*1) = 50

The sum would be 150.

Merging the to assemblies to a 3 kb (plus a little gap in between):

FPKM3 200/(3*1) = 66.67

You need to re-calculate your expression values. If you have a couple of these wrongly separated assemblies, you might use Bedtools cluster to combine all assemblies with a certain distance.

Cheers,

Michael