Question

Unreliable results of R while retrieving DEG

0

Entering edit mode

7.8 years ago

786 ▴ 50

Hi I was working on GSE55650 and received different number of differentially expressed genes just by changing the group format. Although the control and diseased were same in both types of groups but still get different number of genes. I just don't understand why is it so?Can anyone help me in understanding this?I'll be very thankful.

First situation:
groups = pData(phenoData(gse55650dat[[1]]))$Disease_states
groups=as.character(groups)
groups=sort(groups)
groups[c(1:6)]="control"
groups[c(7:12)]="diabetic"
groups[c(13:17)]="control"
groups[c(18:23)]="diabetic"

It gives 21 number of differentially expressed genes BH-adjusted p < 0.05 as the cut-off.

Second situation:
groups = pData(phenoData(gse55650dat[[1]]))$source_name_ch1 
groups=as.character(groups) 
groups[groups=="Control myoblasts"]="non.diabetic" 
groups[groups=="Diabetes myoblasts"]="diabetic"
groups[groups=="Control myotubes"]="non.diabetic" 
groups[groups=="Diabetes myotubes"]="diabetic"

It gives 14 number of differentially expressed genes at same p value criteria. Why this difference appears?Please help me understanding this it'll be highly appreciated.

R affy limma GEOquery • 1.5k views

ADD COMMENT • link 7.8 years ago by 786 ▴ 50

0

Entering edit mode

can you please post the groups vector before and after the renaming? i.e. for both cases you tried, I would like to see the groups vector after the as.character(groups) thing and then after the whole renaming step :)

ADD REPLY • link 7.8 years ago by Selenocysteine ▴ 620

0

Entering edit mode

groups = pData(phenoData(gse55650dat[[1]]))$source_name_ch1 
groups=as.character(groups) 
groups[groups=="Control myoblasts"]="non.diabetic" 
groups[groups=="Diabetes myoblasts"]="diabetic"
groups[groups=="Control myotubes"]="non.diabetic" 
groups[groups=="Diabetes myotubes"]="diabetic"
f = factor(groups, levels=c("non.diabetic","diabetic"))
design_gse55650 = model.matrix(~0+f)
colnames(design_gse55650) = levels(f)
cont.matrix = makeContrasts(diabetic-non.diabetic, levels=design_gse55650)

fit = lmFit(gse55650eset, design_gse55650)
fit2= contrasts.fit(fit, cont.matrix)
fit2 = eBayes(fit2)

fit2$genes$Symbol = getSYMBOL(fit2$genes$ID, "hgu133plus2") 
fit2$genes$GeneName = unlist(mget(fit$genes$ID, hgu133plus2GENENAME))  

results = topTable(fit2, adjust ="BH", number = nrow(gse55650eset))