Question: Error in function (type, msg, asError = TRUE); unable to analyze HTA2.0 GSE set
0
gravatar for collinmlanger
2.9 years ago by
University of Chicago
collinmlanger0 wrote:

Hello everyone,

I'm fairly new to analyzing micro-arrays, but that being said, GSE103974 in particular is giving me some problems. I've tried various analysis pipelines of my own in R and have constantly been running into problems I don't run into with other GSE sets, in one form or another. I'm trying to do a basic gene-level differential expression analysis on the data using the autogenerated R script from GEO2R. However, I am getting an error that reads:

ftp://ftp-ext.ncbi.nlm.nih.gov/geo/series/GSE103nnn/GSE103974/matrix/
Error in function (type, msg, asError = TRUE)  : 
Server denied you to change to the given directory
Calls: getGEO ... getURL -> curlPerform -> .Call ->  -> fun
Execution halted

Best I can tell is that there is something up with the matrix file that was uploaded, but I don't know how to troubleshoot farther than that. Any insight would be appreciated.

The script it is running is:

# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Mon Sep 25 15:42:52 EDT 2017

################################################################
#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE103974", GSEMatrix =TRUE, AnnotGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
          (qx[6]-qx[1] > 50 && qx[2] > 0) ||
          (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
  exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","SPOT_ID"))
write.table(tT, file=stdout(), row.names=F, sep="\t")


################################################################
#   Boxplot for selected GEO samples
library(Biobase)
library(GEOquery)

# load series and platform data from GEO

gset <- getGEO("GSE103974", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# group names for all samples in a series
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
sml <- paste("G", sml, sep="")  set group names

# order samples by group
ex <- exprs(gset)[ , order(sml)]
sml <- sml[order(sml)]
fl <- as.factor(sml)
labels <- c("MG","HD")

# set parameters and draw the plot
palette(c("#dfeaf4","#f4dfdf", "#AABBCC"))
dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE103974", '/', annotation(gset), " selected samples", sep ='')
boxplot(ex, boxwex=0.6, notch=T, main=title, outline=FALSE, las=2, col=fl)
legend("topleft", labels, fill=palette(), bty="n")
gene microarray R software error • 2.8k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.9 years ago by collinmlanger0

Looks like a rights issue at the server side...? - "Server denied you to change to the given directory". You may have to contact the GEO, in that case.

Do you have the option of downloading the raw data CEL files? In that case, you could process the data yourself using the oligo, affy, or another R package designed for microarray analyses.

ADD REPLYlink written 2.9 years ago by Kevin Blighe63k

I do have access to the .CEL files. I've been trying to follow this https://support.bioconductor.org/p/89308/

My code:

library(oligo)

path = 'C:/Users/. . . '
setwd(path)
dat <- read.celfiles(list.celfiles(path, listGzipped=TRUE))

eset <- rma(dat)

##get rid of background probes and annotate using functions in my affycoretools package

library(affycoretools)

library(hta20transcriptcluster.db)

eset.main <- getMainProbes(eset, pd.hta.2.0)

eset.main <- annotateEset(eset.main, hta20stranscriptcluster.db)

the code gets through the normalization, but I'm having trouble with the annotation. I'm getting this error:

> eset.main <- getMainProbes(eset, pd.hta.2.0)
Error in switch(level, core = dbGetQuery(con, paste("select distinct core_mps.transcript_cluster_id, type from featureSet inner join core_mps",  : 
  EXPR must be a length 1 vector
ADD REPLYlink written 2.9 years ago by collinmlanger0
1

ADD REPLYlink modified 13 months ago • written 2.9 years ago by Kevin Blighe63k

Wow, thank you so much Kevin! Truly saved me some frustration, that script worked beautifully. My plan was to work through the limma vignette and rework the GEO2R script to fit the expression sets I got from your code, but if you have a better resource for limma I certainly wouldn't turn it down

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by collinmlanger0

ADD REPLYlink modified 13 months ago • written 2.9 years ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1375 users visited in the last hour