Question

Comparing Rpkms For One Test Sample Vs Multiple Controls

0

Entering edit mode

10.5 years ago

Travis ★ 2.8k

Hi all,

I have RPKM values for a single sample (lung adenocarcinoma) and wish to compare it to RPKM values for a group of controls (50 TCGA normal lung samples).

Bearing in mind the one to many nature of the analysis, and RPKMs being the starting point, can someone recommend the best method/software for calculating differential expression with some appropriate measures of significance? At its most basic I have calculated fold changes and Z-scores (mean and median based) but I am guessing this is overly simplistic.

All help appreciated.

rna-seq differential-expression next-gen • 3.6k views

ADD COMMENT • link updated 10.5 years ago by Hayssam ▴ 280 • written 10.5 years ago by Travis ★ 2.8k

score 0 · Answer 1 · 2013-11-06

0

Entering edit mode

10.5 years ago

Hayssam ▴ 280

Hi, I don't think there's any reason not to start by using one of the available differential expression test in R. I'd recommend edgeR or DESeq. Both have nice tutorials to get you started and both should handle the class imbalance adequately. However these two methods expect raw reads counts, not RPKM. For the TCGA samples, raw counts are available but you have to take level 2 I think. Is there any reasons for you to stick with RPKMs? If yes, be aware that you risk of loosing statistical power by using them.

ADD COMMENT • link 10.5 years ago by Hayssam ▴ 280

0

Entering edit mode

I had assumed it would not be safe to take raw counts from different sources/centers and attempt differential expression analysis. Do both DESeq and edgeR attempt to correct for issues like differences in sequencing depth?

ADD REPLY • link 10.5 years ago by Travis ★ 2.8k

0

Entering edit mode

Different library sizes (due to both different sequencing depth and different ratio of mappable reads) are exactly the raison d'être for these approaches. There's several papers explaining why RPKM is not appropriately dealing with that. See e.g. Differential Gene Expression Analysis - Rpkm Vs Readcount and Rnaseq Differential Expression. About RPKM inconsistencies, you can have a starting look with this blog post.

Furthermore, if you suspect there's some batch effects (e.g. a lab effect for samples coming from different centers), linear modeling in edgeR can help you to correct/account for this. There's a large scale RNA-sequencing effort that got a study published recently and that adequately dealt with batch effects. If that's interesting for you, you could start browsing from the GEUVADIS RNA-Seq website.

ADD REPLY • link 10.5 years ago by Hayssam ▴ 280