Hello everyone,
I'm fairly new to analyzing micro-arrays, but that being said, GSE103974 in particular is giving me some problems. I've tried various analysis pipelines of my own in R and have constantly been running into problems I don't run into with other GSE sets, in one form or another. I'm trying to do a basic gene-level differential expression analysis on the data using the autogenerated R script from GEO2R. However, I am getting an error that reads:
ftp://ftp-ext.ncbi.nlm.nih.gov/geo/series/GSE103nnn/GSE103974/matrix/
Error in function (type, msg, asError = TRUE) :
Server denied you to change to the given directory
Calls: getGEO ... getURL -> curlPerform -> .Call -> -> fun
Execution halted
Best I can tell is that there is something up with the matrix file that was uploaded, but I don't know how to troubleshoot farther than that. Any insight would be appreciated.
The script it is running is:
# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated Mon Sep 25 15:42:52 EDT 2017
################################################################
# Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)
# load series and platform data from GEO
gset <- getGEO("GSE103974", GSEMatrix =TRUE, AnnotGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
# make proper column names to match toptable
fvarLabels(gset) <- make.names(fvarLabels(gset))
# group names for all samples
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
(qx[6]-qx[1] > 50 && qx[2] > 0) ||
(qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
exprs(gset) <- log2(ex) }
# set up the data and proceed with analysis
sml <- paste("G", sml, sep="") # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)
tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","SPOT_ID"))
write.table(tT, file=stdout(), row.names=F, sep="\t")
################################################################
# Boxplot for selected GEO samples
library(Biobase)
library(GEOquery)
# load series and platform data from GEO
gset <- getGEO("GSE103974", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
# group names for all samples in a series
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
sml <- paste("G", sml, sep="") set group names
# order samples by group
ex <- exprs(gset)[ , order(sml)]
sml <- sml[order(sml)]
fl <- as.factor(sml)
labels <- c("MG","HD")
# set parameters and draw the plot
palette(c("#dfeaf4","#f4dfdf", "#AABBCC"))
dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE103974", '/', annotation(gset), " selected samples", sep ='')
boxplot(ex, boxwex=0.6, notch=T, main=title, outline=FALSE, las=2, col=fl)
legend("topleft", labels, fill=palette(), bty="n")
Looks like a rights issue at the server side...? - "Server denied you to change to the given directory". You may have to contact the GEO, in that case.
Do you have the option of downloading the raw data CEL files? In that case, you could process the data yourself using the oligo, affy, or another R package designed for microarray analyses.
I do have access to the .CEL files. I've been trying to follow this https://support.bioconductor.org/p/89308/
My code:
the code gets through the normalization, but I'm having trouble with the annotation. I'm getting this error:
Wow, thank you so much Kevin! Truly saved me some frustration, that script worked beautifully. My plan was to work through the limma vignette and rework the GEO2R script to fit the expression sets I got from your code, but if you have a better resource for limma I certainly wouldn't turn it down