Question: Getting error when using lmfit (LIMMA) for Microarray data analysis
0
gravatar for DataStoryTeller
21 days ago by
DataStoryTeller10 wrote:

This is a code i am running:

fit <- lmFit(eset,design)

This is the error I am getting

Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be an array of at least two dimensions

This is some relevant information:

> dim(eset)

Features Samples

54675 161

> dim(design)

[1] 161 12

> typeof(eset)

[1] "S4"

> typeof(design)

[1] "double"

Thanks!

limma microarray R • 188 views
ADD COMMENTlink modified 14 days ago • written 21 days ago by DataStoryTeller10

What is the output of:

str(eset)
eset
dim(exprs(eset))
ADD REPLYlink written 20 days ago by Kevin Blighe65k
    > str(eset)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr ""
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr ""
  .. .. ..@ title            : chr ""
  .. .. ..@ abstract         : chr ""
  .. .. ..@ url              : chr ""
  .. .. ..@ pubMedIds        : chr ""
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            : list()
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ assayData        :<environment: 0x7fadf5527510> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr "Index"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2
  .. .. ..@ data             :'data.frame': 161 obs. of  1 variable:
  .. .. .. ..$ index: int [1:161] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr NA
  .. .. ..@ data             :'data.frame': 54675 obs. of  1 variable:
  .. .. .. ..$ Symbol: chr [1:54675] NA "RFC2" "HSPA6" "PAX8" ...
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ annotation       : chr "pd.hg.u133.plus.2"
  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 2 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr [1:2] "Names of files used in 'exprs'" "Run dates for files used in 'exprs'"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2 2
  .. .. ..@ data             :'data.frame': 161 obs. of  2 variables:
  .. .. .. ..$ exprs: chr [1:161] "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119615.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119616.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119617.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119618.CEL" ...
  .. .. .. ..$ dates: chr [1:161] "01/25/06 15:55:07" "07/28/04 11:26:10" "07/28/04 12:53:35" "07/28/04 12:16:03" ...
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 4
  .. .. .. ..$ : int [1:3] 4 0 0
  .. .. .. ..$ : int [1:3] 2 48 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0

    > eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 161 samples 
  element names: exprs 
protocolData
  rowNames: GSM119615.CEL GSM119616.CEL ... GSM238963.CEL (161 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM119615.CEL GSM119616.CEL ... GSM238963.CEL (161 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
  fvarLabels: Symbol
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2 

    > dim(exprs(eset))
[1] 54675   161
ADD REPLYlink modified 20 days ago by Kevin Blighe65k • written 20 days ago by DataStoryTeller10

Thank you. So, it is an ExpressionSet. How was it produced (show all code, please)? How was design produced?

ADD REPLYlink written 20 days ago by Kevin Blighe65k
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")

RawData <- read.celfiles(CELFiles)

sns <- sampleNames(RawData)

PreProcessedData<- oligo::rma(RawData)

exprs(PreProcessedData)[1:10,1:3]

# Move the normalized expression data
PreProcessedData_exprs <- exprs(PreProcessedData)

# Make the normalized expression data a data frame
PreProcessedData_dataframe <- data.frame(as.ffdf(PreProcessedData_exprs))
write.table(PreProcessedData_dataframe, file="RMA_Preprocessed_Expression_Table.txt", sep="\t")




condition <- factor(samples$Condition, levels=c("Control","AD"))

brainregion <- factor(samples$BrainRegion, levels=c("EC","HC","MTG","PC","SFG","VCX"))

CS <- paste(samples$Condition, samples$BrainRegion, sep=".")
CS
CS <- factor(CS, levels=c("Control.EC","AD.EC",
                          "Control.HC","AD.HC",
                          "Control.MTG","AD.MTG",
                          "Control.PC","AD.PC",
                          "Control.SFG","AD.SFG",
                          "Control.VCX","AD.VCX"))

design <- model.matrix(~CS+0)

colnames(design) <- levels(CS)

contr.matrix <- makeContrasts(
  EC_ADvControl=AD.EC-Control.EC,
  HC_ADvControl=AD.HC-Control.HC,
  MTG_ADvControl=AD.MTG-Control.MTG,
  PC_ADvControl=AD.PC-Control.PC,
  SFG_ADvControl=AD.SFG-Control.SFG,
  VCX_ADvControl=AD.VCX-Control.VCX,
  levels=design)


contr.matrix
# Load the package for probe to transcript or gene annotation
library(annotate)


# Rename your rma normalized processed expression data as eset for differential expression analysis
eset <- PreProcessedData
# Extract the probe IDs from eset, which has all the normalized expression data
ID <- featureNames(eset)
# Apply the probe to gene symbol ID for each probe in eset
SYMBOL <- getSYMBOL(ID,"hgu133plus2.db")
NO_NA_PreProcessedData_dataframe<-PreProcessedData_dataframe[!is.na(SYMBOL),]
# Add the gene symbols back into your eset data fram
fData(eset) <- data.frame(Symbol=SYMBOL)
# fit the linear model
fit <- lmFit(eset,design)
# apply your contrasts
ADD REPLYlink modified 20 days ago by Kevin Blighe65k • written 20 days ago by DataStoryTeller10

Thanks, I think that it relates to the annotation part. Can you confirm that this works:?

eset <- PreProcessedData
fit <- lmFit(eset, design)

?

This particular command is likely messing up your ExpressionSet object:

fData(eset) <- data.frame(Symbol=SYMBOL)
ADD REPLYlink written 20 days ago by Kevin Blighe65k

This works:

eset <- PreProcessedData

This does not work:

fit <- lmFit(eset,design)

I tried getting rid of:

fData(eset) <- data.frame(Symbol=SYMBOL)

does not seem to change anything

ADD REPLYlink written 20 days ago by DataStoryTeller10

Are you sure? If you literally do this, it produces the same error?

eset <- oligo::rma(RawData)
limma::lmFit(eset, design)

If so, then the error is with your design object. Please double check it.

Also output sessionInfo()

ADD REPLYlink modified 20 days ago • written 20 days ago by Kevin Blighe65k

I get the same exact error even with the code you just posted.

    > sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] annotate_1.66.0          XML_3.99-0.5             pd.hg.u133.plus.2_3.12.0
 [4] DBI_1.1.0                RSQLite_2.2.0            hgu133plus2.db_3.2.3    
 [7] org.Hs.eg.db_3.11.4      AnnotationDbi_1.50.3     limma_3.44.3            
[10] oligo_1.52.1             Biostrings_2.56.0        XVector_0.28.0          
[13] IRanges_2.22.2           S4Vectors_0.26.1         ff_4.0.2                
[16] bit_4.0.4                oligoClasses_1.50.4      Biobase_2.48.0          
[19] BiocGenerics_0.34.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  compiler_4.0.2              BiocManager_1.30.10        
 [4] GenomeInfoDb_1.24.2         bitops_1.0-6                iterators_1.0.12           
 [7] tools_4.0.2                 zlibbioc_1.34.0             digest_0.6.25              
[10] memoise_1.1.0               preprocessCore_1.50.0       lattice_0.20-41            
[13] pkgconfig_2.0.3             rlang_0.4.7                 Matrix_1.2-18              
[16] foreach_1.5.0               DelayedArray_0.14.1         rstudioapi_0.11            
[19] GenomeInfoDbData_1.2.3      affxparser_1.60.0           vctrs_0.3.2                
[22] bit64_4.0.2                 grid_4.0.2                  blob_1.2.1                 
[25] codetools_0.2-16            matrixStats_0.56.0          GenomicRanges_1.40.0       
[28] splines_4.0.2               SummarizedExperiment_1.18.2 xtable_1.8-4               
[31] RCurl_1.98-1.2              crayon_1.3.4                affyio_1.58.0

What do you think is wrong my design object, the dimensions seem to be fine and my contrast matrix correctly identifies comparisons?

ADD REPLYlink modified 20 days ago by Kevin Blighe65k • written 20 days ago by DataStoryTeller10

Okay, how about:

eset <- oligo::rma(RawData)
exprs(eset)[1:5,1:5]
head(apply(exprs(eset), 1, mean))
ADD REPLYlink written 20 days ago by Kevin Blighe65k
    > eset <- oligo::rma(RawData)
Background correcting... OK
Normalizing... OK
Summarizing... OK

    > exprs(eset)[1:5,1:5]
opening ff /Users/ashaypatel/rma-29241e485da3.ff
          GSM119615.CEL GSM119616.CEL GSM119617.CEL GSM119618.CEL GSM119619.CEL
1007_s_at      8.243370      8.311776      8.841415      8.878884      8.481599
1053_at        3.246700      3.127588      3.070634      3.008858      3.196024
117_at         3.471962      3.412241      3.495739      3.988356      3.222250
121_at         5.869241      6.357483      6.971408      6.888627      6.015305
1255_g_at      3.088038      3.837203      3.167099      2.796989      5.112739

    > head(apply(exprs(eset), 1, mean))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'head': invalid first argument, must be an array
ADD REPLYlink modified 20 days ago by Kevin Blighe65k • written 20 days ago by DataStoryTeller10

The head command should really have worked. I am getting the feeling that this is an issue with Mac OS.

Does this not even work:

fit <- lmFit(t(exprs(eset)),design)
ADD REPLYlink written 20 days ago by Kevin Blighe65k

Sorry, because my account is new, it woudn't let me post more. No this does not work. I did recently update my mac os to catalina 10.15.6

ADD REPLYlink written 19 days ago by DataStoryTeller10

At this point, you will have to provide some test data so that I can test it here. This can simply mean pasting some data here and then showing how the error is reproduced upon using this test data

ADD REPLYlink written 19 days ago by Kevin Blighe65k

So I fixed that error. It seems to be one of data type which is strange. If I use PreProcessedDataframe instead of PreProcessedData is works. However now I need help with annotation. It won't let me use fdata to annotate

here is my code: ```

 #Get Raw Data from cell files
setwd("/Users/ashaypatel/Desktop/Research/GSE5281") #Set Working directory
celpath<-"~/Desktop/Research/GSE5281" #Set path to cel files
phenodata<-read.table("~/Desktop/Research/GSE5281/phenodata.txt", header=TRUE) #returns phenodata as dataframe
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")
RawData <- read.celfiles(CELFiles)

#Pre-process data with RMA (Robust Multichip Average Algorithm) from the Oligo Package
PreProcessedData<- oligo::rma(RawData)
annotation(PreProcessedData)<-"hgu133plus2.db"
PreProcessedData_exprs <- exprs(PreProcessedData) #Extract the expression data, ff datatype
PreProcessedData_dataframe <- data.frame(as.ffdf(PreProcessedData_exprs)) #convert from ff -> dataframe

#Contrast Matrix

condition <- factor(phenodata$Condition, levels=c("Control","AD"))
brainregion <- factor(phenodata$BrainRegion, levels=c("EC","HC","MTG","PC","SFG","VCX"))
CS <- paste(phenodata$Condition, phenodata$BrainRegion, sep=".")

CS <- factor(CS, levels=c("Control.EC","AD.EC",
                          "Control.HC","AD.HC",
                          "Control.MTG","AD.MTG",
                          "Control.PC","AD.PC",
                          "Control.SFG","AD.SFG",
                          "Control.VCX","AD.VCX"))
design <- model.matrix(~0+CS)
design
colnames(design) <- levels(CS)
contr.matrix <- makeContrasts(
  EC_ADvControl=AD.EC-Control.EC,
  HC_ADvControl=AD.HC-Control.HC,
  MTG_ADvControl=AD.MTG-Control.MTG,
  PC_ADvControl=AD.PC-Control.PC,
  SFG_ADvControl=AD.SFG-Control.SFG,
  VCX_ADvControl=AD.VCX-Control.VCX,
  levels=design)
contr.matrix


#
eset <- PreProcessedData_dataframe
fit <- limma::lmFit(eset, design) #apply lmfit
cfit <- contrasts.fit(fit, contrasts=contr.matrix) #apply contrasts
efit <- limma::eBayes(cfit) #apply ebayes
summary(decideTests(efit))

tfit <- limma::treat(efit,lfc=log2(1.15))
dt <- decideTests(tfit)
summary(dt)

tfit<-data.frame(tfit)
write.csv(tfit,"/Users/ashaypatel/Desktop/Research/GSE5281/DEG_tfit.csv",row.names=TRUE)

```

Sorry I don't know how to insert block code!

Also what I mean is that I have the data in terms of affymetrix probes and not genes, how do I get genes. I know my .db package but the method I used to use using fdata does not work, it relies on PreProcessedData (an expression set) rather than PreProcessedData_dataframe.

Also the documentation says that lmfit should be able to take an expression set so its strange.

Thank you.

ADD REPLYlink modified 14 days ago • written 14 days ago by DataStoryTeller10
1

This is the first time that you indicate that this is public GEO data. In that case, please see the solution by my colleague;

ADD REPLYlink written 14 days ago by Kevin Blighe65k
1
gravatar for DataStoryTeller
14 days ago by
DataStoryTeller10 wrote:

This is solved thank you for your help. I'm sorry I took up so much of both your time. It was as simple as reinstalling the limma package.

Thank you so much for your help, its been essential!

ADD COMMENTlink written 14 days ago by DataStoryTeller10
1

Don't worry, I was already suspecting something like this. Happens to all of us sometimes :)

ADD REPLYlink written 13 days ago by ATpoint38k
1
gravatar for ATpoint
14 days ago by
ATpoint38k
Germany
ATpoint38k wrote:

Not sure what is happening on your end. I downloaded the data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5281 (at the bottom, the entire GSE5281_RAW.tar), unpacked it and then ran this code (conditions are just random because I have no clue what this experiment is about). I am not sure what some of your code lines were doing so I suggest you simply stick with this below, which is also almost 100% what the limma guide suggests:

library(oligo)
library(limma)

CELFiles <- list.files("~/Downloads/GSE5281_RAW/",full.names=TRUE, pattern = "CEL.gz")

RawData <- oligo::read.celfiles(CELFiles)

PreProcessedData<- oligo::rma(RawData)

# just some dummy condition:
condition <- data.frame(Group=c(rep("control", 100), rep("treatment", 61)))

# model.matrix:
design <- model.matrix(~Group, data = condition)

# fit the linear model
fit <- lmFit(PreProcessedData,design)

# empirical Bayes
eB <- eBayes(fit)

# results
topTable(eB, coef = 1)
ADD COMMENTlink written 14 days ago by ATpoint38k

Do you use Mac OS, because even when i run this lmfit is not working. Could this be a Mac issue. Im going to try to download the data again.

EDIT: I still get the same error

setwd("/Users/ashaypatel/Desktop/Research/GSE5281") #Set Working directory
celpath<-"~/Desktop/Research/GSE5281" #Set path to cel files
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")
RawData <- oligo::read.celfiles(CELFiles)
PreProcessedData<- oligo::rma(RawData)


condition <- data.frame(Group=c(rep("control", 100), rep("treatment", 61)))
design <- model.matrix(~Group, data = condition)
fit <- lmFit(PreProcessedData,design)

I get the error:

> fit <- lmFit(PreProcessedData,design)

Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be an array of at least two dimensions

ADD REPLYlink modified 14 days ago by ATpoint38k • written 14 days ago by DataStoryTeller10

Same error!?- that is bizarre. You have tried this in a fresh R session? Also avoid RStudio if you can - I never use it for analyses.

I presume that you can obtain the data like this:

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO("GSE5281", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
ADD REPLYlink modified 14 days ago • written 14 days ago by Kevin Blighe65k

What do you use instead of RStudio?, I did try a fresh r ression. I'll try restarting my computer

ADD REPLYlink written 14 days ago by DataStoryTeller10

RStudio is totally fine, so is Mac. There might be a few situations where RStudio might be undesirable, e.g. in the context of running forked processes, but this is not relevant here.

Yes, the example I showed was done on a Mac (10.14, R 4.0.2), there is something odd going on on your end. I suggest you reinstall the relevant packages, redownload the data and try again.

ADD REPLYlink modified 14 days ago • written 14 days ago by ATpoint38k

I use the standard R environment, on Ubuntu 16.04.

ADD REPLYlink modified 14 days ago • written 14 days ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1158 users visited in the last hour