Question: Differential Gene Expression Analysis using data_RNA_Seq_v2_expression_median RSEM.Normalized
gravatar for gokce.ouz
3.5 years ago by
gokce.ouz70 wrote:

Hi All,

We have a specific gene mutation and we would like to learn how it is effective on Breast cancer.

So using the R, I get the mutation information from sequenced cases of TCGA Provisional and then stratified patients into two categories as Mutated & Wild Type. I downloaded the mRNA Expression z-Scores (RNA Seq V2 RSEM) from the cBioPortal website. I would like to look at the differentially expressed gene between these two groups but I have several questions :

  1. The RNA seq data is Rsem.normalized, before I do any further analysis I transformed them into log2(rsem+1), that is correct right ?

  2. For differential gene expression analysis what do you suggest me to use ? I cannot use DeSEQ2 or edgeR as they require raw counts as input.

  3. I used limma package but I guess I get shows my data has some problem . Does it look ok or should I do something else ?  Voom & Final model

group = c( rep("Mut", 191), rep("WT", 660))
design <- model.matrix(~ 0 + group)
colnames(design) <- c("Mut", "WT")
y = TCGA_comb
v <- voom(y,design,plot = TRUE)
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(PIK3CA_mutVSwt=Mut - WT,levels=design)
fit.cont <-, cont.matrix)
fit.cont <- eBayes(fit.cont)
plotSA(fit.cont) <- decideTests(fit.cont)
tab <- topTable(fit.cont, n=Inf, coef="PIK3CA_mutVSwt")
  1. Would it be too superficial if I calculate Fold Change, p-value & FDR on my own?

     a) Fold change: Take average of each gene per group and then Log2(B)-Log2(A)
     b) p-value: t.test command of R
     c) FDR: p.adjust(pvalue,method="fdr")

Many many thanks,


dge rna-seq limma R • 3.1k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by gokce.ouz70
gravatar for kristoffer.vittingseerup
3.5 years ago by
European Union
kristoffer.vittingseerup3.5k wrote:

Hi Gokce

Answers below

1) Yes you can you the log-transformation as you describe - but you could also use a smaller pseudo-count (say 0.01) punishing smaller changes less.

2) Raw count data from TCGA's RNA_Seq_V2 are available via the CDC data portal or a most of other APIs such as TCGAbiolinks

3) Don't do that - limma uses normality as an assumption (just like a t.test) - so if you dont trust the limma results you should not trust a t.test. Use voom+limma with raw counts obtained as in 2) instead (limma instead of edgeR since they perform almost identical when you have many replicates (which you have right?) - expect that limma is much much faster.

Hope this helps

ADD COMMENTlink written 3.5 years ago by kristoffer.vittingseerup3.5k

Dear Kristoffer,

Thanks a lot for your answer.

I knew I would be more comfortable with the raw data however I was being forced to work on this normalized data. Maybe it is better to insist on working on the raw data again.

So to clarify, you are suggesting me to use voom+limma on the raw count data instead of edgeR & DESeq2, right ?

ADD REPLYlink written 3.5 years ago by gokce.ouz70

Yes i suggest voom-limma because when you have many replicates (which you do in the TCGA data) they perform similar in benchmark. The difference then becomes that instead of waiting hours on edgeR or DESeq2 you can wait seconds/minuts on limma allowing much greater flexibility.

ADD REPLYlink written 3.5 years ago by kristoffer.vittingseerup3.5k

Thanks a lot for your answer.

ADD REPLYlink written 3.5 years ago by gokce.ouz70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2211 users visited in the last hour