Question

Error in function (type, msg, asError = TRUE); unable to analyze HTA2.0 GSE set

0

Entering edit mode

6.6 years ago

collinmlanger • 0

Hello everyone,

I'm fairly new to analyzing micro-arrays, but that being said, GSE103974 in particular is giving me some problems. I've tried various analysis pipelines of my own in R and have constantly been running into problems I don't run into with other GSE sets, in one form or another. I'm trying to do a basic gene-level differential expression analysis on the data using the autogenerated R script from GEO2R. However, I am getting an error that reads:

ftp://ftp-ext.ncbi.nlm.nih.gov/geo/series/GSE103nnn/GSE103974/matrix/
Error in function (type, msg, asError = TRUE)  : 
Server denied you to change to the given directory
Calls: getGEO ... getURL -> curlPerform -> .Call ->  -> fun
Execution halted

Best I can tell is that there is something up with the matrix file that was uploaded, but I don't know how to troubleshoot farther than that. Any insight would be appreciated.

The script it is running is:

# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Mon Sep 25 15:42:52 EDT 2017

################################################################
#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE103974", GSEMatrix =TRUE, AnnotGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
          (qx[6]-qx[1] > 50 && qx[2] > 0) ||
          (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
  exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","SPOT_ID"))
write.table(tT, file=stdout(), row.names=F, sep="\t")


################################################################
#   Boxplot for selected GEO samples
library(Biobase)
library(GEOquery)

# load series and platform data from GEO

gset <- getGEO("GSE103974", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17586", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# group names for all samples in a series
gsms <- "0000110111010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
sml <- paste("G", sml, sep="")  set group names

# order samples by group
ex <- exprs(gset)[ , order(sml)]
sml <- sml[order(sml)]
fl <- as.factor(sml)
labels <- c("MG","HD")

# set parameters and draw the plot
palette(c("#dfeaf4","#f4dfdf", "#AABBCC"))
dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE103974", '/', annotation(gset), " selected samples", sep ='')
boxplot(ex, boxwex=0.6, notch=T, main=title, outline=FALSE, las=2, col=fl)
legend("topleft", labels, fill=palette(), bty="n")

software error R Microarray gene • 5.5k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 6.6 years ago by collinmlanger • 0

0

Entering edit mode

Looks like a rights issue at the server side...? - "Server denied you to change to the given directory". You may have to contact the GEO, in that case.

Do you have the option of downloading the raw data CEL files? In that case, you could process the data yourself using the oligo, affy, or another R package designed for microarray analyses.

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

I do have access to the .CEL files. I've been trying to follow this https://support.bioconductor.org/p/89308/

My code:

library(oligo)

path = 'C:/Users/. . . '
setwd(path)
dat <- read.celfiles(list.celfiles(path, listGzipped=TRUE))

eset <- rma(dat)

##get rid of background probes and annotate using functions in my affycoretools package

library(affycoretools)

library(hta20transcriptcluster.db)

eset.main <- getMainProbes(eset, pd.hta.2.0)

eset.main <- annotateEset(eset.main, hta20stranscriptcluster.db)

the code gets through the normalization, but I'm having trouble with the annotation. I'm getting this error:

> eset.main <- getMainProbes(eset, pd.hta.2.0)
Error in switch(level, core = dbGetQuery(con, paste("select distinct core_mps.transcript_cluster_id, type from featureSet inner join core_mps",  : 
  EXPR must be a length 1 vector

ADD REPLY • link 6.6 years ago by collinmlanger • 0

1

Entering edit mode

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Wow, thank you so much Kevin! Truly saved me some frustration, that script worked beautifully. My plan was to work through the limma vignette and rework the GEO2R script to fit the expression sets I got from your code, but if you have a better resource for limma I certainly wouldn't turn it down

ADD REPLY • link 6.6 years ago by collinmlanger • 0

0

Entering edit mode

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k