We have a specific gene mutation and we would like to learn how it is effective on Breast cancer.
So using the R, I get the mutation information from sequenced cases of TCGA Provisional and then stratified patients into two categories as Mutated & Wild Type. I downloaded the mRNA Expression z-Scores (RNA Seq V2 RSEM) from the cBioPortal website. I would like to look at the differentially expressed gene between these two groups but I have several questions :
The RNA seq data is Rsem.normalized, before I do any further analysis I transformed them into log2(rsem+1), that is correct right ?
For differential gene expression analysis what do you suggest me to use ? I cannot use DeSEQ2 or edgeR as they require raw counts as input.
I used limma package but I guess I get shows my data has some problem . Does it look ok or should I do something else ?
library(edgeR) library(limma) group = c( rep("Mut", 191), rep("WT", 660)) design <- model.matrix(~ 0 + group) colnames(design) <- c("Mut", "WT") y = TCGA_comb par(mfrow=c(1,2)) v <- voom(y,design,plot = TRUE) fit <- lmFit(v, design) cont.matrix <- makeContrasts(PIK3CA_mutVSwt=Mut - WT,levels=design) fit.cont <- contrasts.fit(fit, cont.matrix) fit.cont <- eBayes(fit.cont) plotSA(fit.cont) summa.fit <- decideTests(fit.cont) tab <- topTable(fit.cont, n=Inf, coef="PIK3CA_mutVSwt")
Would it be too superficial if I calculate Fold Change, p-value & FDR on my own?
a) Fold change: Take average of each gene per group and then Log2(B)-Log2(A) b) p-value: t.test command of R c) FDR: p.adjust(pvalue,method="fdr")
Many many thanks,