Question

Differential expression based on normalized data

0

Entering edit mode

2.1 years ago

iibraimi18 • 0

Dear All,

I have a data set containing expression profiles (normalized RPKM data) based on RNAseq. The raw count of the data is not accessible. What is the best package to use in this case in view of performing differential expression ? I realized that DESeq2 can not be used in that case. I know one can use the Limma package but are there also other packaged eventually more advantageous in some sense than Limma in the case of having only the RPKM of the data ?

Thanks.

expression differential • 670 views

ADD COMMENT • link updated 2.1 years ago by Kevin Blighe 87k • written 2.1 years ago by iibraimi18 • 0

score 1 · Answer 1 · 2022-03-24

1

Entering edit mode

2.1 years ago

Kevin Blighe 87k

Hi iibraimi18,

If you just have RPKM expression levels, then you can transform these via log ]base 2], and use the limma-trend workflow. Please see the point #3 by Gordon, here: https://support.bioconductor.org/p/56275/#56299

However, keep in mind the limitations that pertain to performing cross-sample comparisons using RPKM / FPKM data.

Kevin

ADD COMMENT • link 2.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for the comment. Comparing between groups being the basis of DE, I do not see then a way out in using only RPKM for this scope. How stringent are these limitations? Is limma-trend the only option to use in my case for performing DE between groups ? I am comparing between at least 10 different groups or conditions inside the big data set. I am also wondering if there is a way to estimate the raw counts based on RPKM alone and then use the standard DESeq package?

ADD REPLY • link 2.1 years ago by iibraimi18 • 0

1

Entering edit mode

You can reverse-engineer RPKM values, if you wish, and if you have all relevant pieces of information:

RPKM = Gene Reads / ( Gene Length * (Total Reads / 1 000 000))

Gene Reads = number of reads aligned / mapped to the gene in question
Gene Length = gene length in kilobase-pair
Total Reads = total number of aligned / mapped reads in the sample in question

So, if you want 'Gene Reads' (raw counts), then you need:

Gene Reads = RPKM * ( Gene Length * (Total Reads / 1 000 000))

I do not have enough information to comment on the limitations of the limma-trend method used in this context - my sincere apologies.

Another possibility is to transform the RPKM values to Z-scores using the zFPKM package. On the Z-scale, you can use any parametric test to derive p-values.

ADD REPLY • link 2.1 years ago by Kevin Blighe 87k