Getting error when using lmfit (LIMMA) for Microarray data analysis
2
0
Entering edit mode
4.3 years ago
biotrekker ▴ 110

This is a code i am running:

fit <- lmFit(eset,design)

This is the error I am getting

Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be an array of at least two dimensions

This is some relevant information:

> dim(eset)

Features Samples

54675 161

> dim(design)

[1] 161 12

> typeof(eset)

[1] "S4"

> typeof(design)

[1] "double"

Thanks!

R Microarray LIMMA • 3.4k views
ADD COMMENT
0
Entering edit mode

What is the output of:

str(eset)
eset
dim(exprs(eset))
ADD REPLY
0
Entering edit mode
    > str(eset)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr ""
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr ""
  .. .. ..@ title            : chr ""
  .. .. ..@ abstract         : chr ""
  .. .. ..@ url              : chr ""
  .. .. ..@ pubMedIds        : chr ""
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            : list()
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ assayData        :<environment: 0x7fadf5527510> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr "Index"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2
  .. .. ..@ data             :'data.frame': 161 obs. of  1 variable:
  .. .. .. ..$ index: int [1:161] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr NA
  .. .. ..@ data             :'data.frame': 54675 obs. of  1 variable:
  .. .. .. ..$ Symbol: chr [1:54675] NA "RFC2" "HSPA6" "PAX8" ...
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ annotation       : chr "pd.hg.u133.plus.2"
  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 2 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr [1:2] "Names of files used in 'exprs'" "Run dates for files used in 'exprs'"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2 2
  .. .. ..@ data             :'data.frame': 161 obs. of  2 variables:
  .. .. .. ..$ exprs: chr [1:161] "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119615.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119616.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119617.CEL" "/Users/ashaypatel/Desktop/Research/GSE5281/GSM119618.CEL" ...
  .. .. .. ..$ dates: chr [1:161] "01/25/06 15:55:07" "07/28/04 11:26:10" "07/28/04 12:53:35" "07/28/04 12:16:03" ...
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 4
  .. .. .. ..$ : int [1:3] 4 0 0
  .. .. .. ..$ : int [1:3] 2 48 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0

    > eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 161 samples 
  element names: exprs 
protocolData
  rowNames: GSM119615.CEL GSM119616.CEL ... GSM238963.CEL (161 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM119615.CEL GSM119616.CEL ... GSM238963.CEL (161 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
  fvarLabels: Symbol
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2 

    > dim(exprs(eset))
[1] 54675   161
ADD REPLY
0
Entering edit mode

Thank you. So, it is an ExpressionSet. How was it produced (show all code, please)? How was design produced?

ADD REPLY
0
Entering edit mode
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")

RawData <- read.celfiles(CELFiles)

sns <- sampleNames(RawData)

PreProcessedData<- oligo::rma(RawData)

exprs(PreProcessedData)[1:10,1:3]

# Move the normalized expression data
PreProcessedData_exprs <- exprs(PreProcessedData)

# Make the normalized expression data a data frame
PreProcessedData_dataframe <- data.frame(as.ffdf(PreProcessedData_exprs))
write.table(PreProcessedData_dataframe, file="RMA_Preprocessed_Expression_Table.txt", sep="\t")




condition <- factor(samples$Condition, levels=c("Control","AD"))

brainregion <- factor(samples$BrainRegion, levels=c("EC","HC","MTG","PC","SFG","VCX"))

CS <- paste(samples$Condition, samples$BrainRegion, sep=".")
CS
CS <- factor(CS, levels=c("Control.EC","AD.EC",
                          "Control.HC","AD.HC",
                          "Control.MTG","AD.MTG",
                          "Control.PC","AD.PC",
                          "Control.SFG","AD.SFG",
                          "Control.VCX","AD.VCX"))

design <- model.matrix(~CS+0)

colnames(design) <- levels(CS)

contr.matrix <- makeContrasts(
  EC_ADvControl=AD.EC-Control.EC,
  HC_ADvControl=AD.HC-Control.HC,
  MTG_ADvControl=AD.MTG-Control.MTG,
  PC_ADvControl=AD.PC-Control.PC,
  SFG_ADvControl=AD.SFG-Control.SFG,
  VCX_ADvControl=AD.VCX-Control.VCX,
  levels=design)


contr.matrix
# Load the package for probe to transcript or gene annotation
library(annotate)


# Rename your rma normalized processed expression data as eset for differential expression analysis
eset <- PreProcessedData
# Extract the probe IDs from eset, which has all the normalized expression data
ID <- featureNames(eset)
# Apply the probe to gene symbol ID for each probe in eset
SYMBOL <- getSYMBOL(ID,"hgu133plus2.db")
NO_NA_PreProcessedData_dataframe<-PreProcessedData_dataframe[!is.na(SYMBOL),]
# Add the gene symbols back into your eset data fram
fData(eset) <- data.frame(Symbol=SYMBOL)
# fit the linear model
fit <- lmFit(eset,design)
# apply your contrasts
ADD REPLY
0
Entering edit mode

Thanks, I think that it relates to the annotation part. Can you confirm that this works:?

eset <- PreProcessedData
fit <- lmFit(eset, design)

?

This particular command is likely messing up your ExpressionSet object:

fData(eset) <- data.frame(Symbol=SYMBOL)
ADD REPLY
0
Entering edit mode

This works:

eset <- PreProcessedData

This does not work:

fit <- lmFit(eset,design)

I tried getting rid of:

fData(eset) <- data.frame(Symbol=SYMBOL)

does not seem to change anything

ADD REPLY
0
Entering edit mode

Are you sure? If you literally do this, it produces the same error?

eset <- oligo::rma(RawData)
limma::lmFit(eset, design)

If so, then the error is with your design object. Please double check it.

Also output sessionInfo()

ADD REPLY
0
Entering edit mode

I get the same exact error even with the code you just posted.

    > sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] annotate_1.66.0          XML_3.99-0.5             pd.hg.u133.plus.2_3.12.0
 [4] DBI_1.1.0                RSQLite_2.2.0            hgu133plus2.db_3.2.3    
 [7] org.Hs.eg.db_3.11.4      AnnotationDbi_1.50.3     limma_3.44.3            
[10] oligo_1.52.1             Biostrings_2.56.0        XVector_0.28.0          
[13] IRanges_2.22.2           S4Vectors_0.26.1         ff_4.0.2                
[16] bit_4.0.4                oligoClasses_1.50.4      Biobase_2.48.0          
[19] BiocGenerics_0.34.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  compiler_4.0.2              BiocManager_1.30.10        
 [4] GenomeInfoDb_1.24.2         bitops_1.0-6                iterators_1.0.12           
 [7] tools_4.0.2                 zlibbioc_1.34.0             digest_0.6.25              
[10] memoise_1.1.0               preprocessCore_1.50.0       lattice_0.20-41            
[13] pkgconfig_2.0.3             rlang_0.4.7                 Matrix_1.2-18              
[16] foreach_1.5.0               DelayedArray_0.14.1         rstudioapi_0.11            
[19] GenomeInfoDbData_1.2.3      affxparser_1.60.0           vctrs_0.3.2                
[22] bit64_4.0.2                 grid_4.0.2                  blob_1.2.1                 
[25] codetools_0.2-16            matrixStats_0.56.0          GenomicRanges_1.40.0       
[28] splines_4.0.2               SummarizedExperiment_1.18.2 xtable_1.8-4               
[31] RCurl_1.98-1.2              crayon_1.3.4                affyio_1.58.0

What do you think is wrong my design object, the dimensions seem to be fine and my contrast matrix correctly identifies comparisons?

ADD REPLY
0
Entering edit mode

Okay, how about:

eset <- oligo::rma(RawData)
exprs(eset)[1:5,1:5]
head(apply(exprs(eset), 1, mean))
ADD REPLY
0
Entering edit mode
    > eset <- oligo::rma(RawData)
Background correcting... OK
Normalizing... OK
Summarizing... OK

    > exprs(eset)[1:5,1:5]
opening ff /Users/ashaypatel/rma-29241e485da3.ff
          GSM119615.CEL GSM119616.CEL GSM119617.CEL GSM119618.CEL GSM119619.CEL
1007_s_at      8.243370      8.311776      8.841415      8.878884      8.481599
1053_at        3.246700      3.127588      3.070634      3.008858      3.196024
117_at         3.471962      3.412241      3.495739      3.988356      3.222250
121_at         5.869241      6.357483      6.971408      6.888627      6.015305
1255_g_at      3.088038      3.837203      3.167099      2.796989      5.112739

    > head(apply(exprs(eset), 1, mean))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'head': invalid first argument, must be an array
ADD REPLY
0
Entering edit mode

The head command should really have worked. I am getting the feeling that this is an issue with Mac OS.

Does this not even work:

fit <- lmFit(t(exprs(eset)),design)
ADD REPLY
0
Entering edit mode

Sorry, because my account is new, it woudn't let me post more. No this does not work. I did recently update my mac os to catalina 10.15.6

ADD REPLY
0
Entering edit mode

At this point, you will have to provide some test data so that I can test it here. This can simply mean pasting some data here and then showing how the error is reproduced upon using this test data

ADD REPLY
0
Entering edit mode

So I fixed that error. It seems to be one of data type which is strange. If I use PreProcessedDataframe instead of PreProcessedData is works. However now I need help with annotation. It won't let me use fdata to annotate

here is my code: ```

 #Get Raw Data from cell files
setwd("/Users/ashaypatel/Desktop/Research/GSE5281") #Set Working directory
celpath<-"~/Desktop/Research/GSE5281" #Set path to cel files
phenodata<-read.table("~/Desktop/Research/GSE5281/phenodata.txt", header=TRUE) #returns phenodata as dataframe
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")
RawData <- read.celfiles(CELFiles)

#Pre-process data with RMA (Robust Multichip Average Algorithm) from the Oligo Package
PreProcessedData<- oligo::rma(RawData)
annotation(PreProcessedData)<-"hgu133plus2.db"
PreProcessedData_exprs <- exprs(PreProcessedData) #Extract the expression data, ff datatype
PreProcessedData_dataframe <- data.frame(as.ffdf(PreProcessedData_exprs)) #convert from ff -> dataframe

#Contrast Matrix

condition <- factor(phenodata$Condition, levels=c("Control","AD"))
brainregion <- factor(phenodata$BrainRegion, levels=c("EC","HC","MTG","PC","SFG","VCX"))
CS <- paste(phenodata$Condition, phenodata$BrainRegion, sep=".")

CS <- factor(CS, levels=c("Control.EC","AD.EC",
                          "Control.HC","AD.HC",
                          "Control.MTG","AD.MTG",
                          "Control.PC","AD.PC",
                          "Control.SFG","AD.SFG",
                          "Control.VCX","AD.VCX"))
design <- model.matrix(~0+CS)
design
colnames(design) <- levels(CS)
contr.matrix <- makeContrasts(
  EC_ADvControl=AD.EC-Control.EC,
  HC_ADvControl=AD.HC-Control.HC,
  MTG_ADvControl=AD.MTG-Control.MTG,
  PC_ADvControl=AD.PC-Control.PC,
  SFG_ADvControl=AD.SFG-Control.SFG,
  VCX_ADvControl=AD.VCX-Control.VCX,
  levels=design)
contr.matrix


#
eset <- PreProcessedData_dataframe
fit <- limma::lmFit(eset, design) #apply lmfit
cfit <- contrasts.fit(fit, contrasts=contr.matrix) #apply contrasts
efit <- limma::eBayes(cfit) #apply ebayes
summary(decideTests(efit))

tfit <- limma::treat(efit,lfc=log2(1.15))
dt <- decideTests(tfit)
summary(dt)

tfit<-data.frame(tfit)
write.csv(tfit,"/Users/ashaypatel/Desktop/Research/GSE5281/DEG_tfit.csv",row.names=TRUE)

```

Sorry I don't know how to insert block code!

Also what I mean is that I have the data in terms of affymetrix probes and not genes, how do I get genes. I know my .db package but the method I used to use using fdata does not work, it relies on PreProcessedData (an expression set) rather than PreProcessedData_dataframe.

Also the documentation says that lmfit should be able to take an expression set so its strange.

Thank you.

ADD REPLY
1
Entering edit mode

This is the first time that you indicate that this is public GEO data. In that case, please see the solution by my colleague;

ADD REPLY
1
Entering edit mode
4.3 years ago
biotrekker ▴ 110

This is solved thank you for your help. I'm sorry I took up so much of both your time. It was as simple as reinstalling the limma package.

Thank you so much for your help, its been essential!

ADD COMMENT
1
Entering edit mode

Don't worry, I was already suspecting something like this. Happens to all of us sometimes :)

ADD REPLY
1
Entering edit mode
4.3 years ago
ATpoint 86k

Not sure what is happening on your end. I downloaded the data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5281 (at the bottom, the entire GSE5281_RAW.tar), unpacked it and then ran this code (conditions are just random because I have no clue what this experiment is about). I am not sure what some of your code lines were doing so I suggest you simply stick with this below, which is also almost 100% what the limma guide suggests:

library(oligo)
library(limma)

CELFiles <- list.files("~/Downloads/GSE5281_RAW/",full.names=TRUE, pattern = "CEL.gz")

RawData <- oligo::read.celfiles(CELFiles)

PreProcessedData<- oligo::rma(RawData)

# just some dummy condition:
condition <- data.frame(Group=c(rep("control", 100), rep("treatment", 61)))

# model.matrix:
design <- model.matrix(~Group, data = condition)

# fit the linear model
fit <- lmFit(PreProcessedData,design)

# empirical Bayes
eB <- eBayes(fit)

# results
topTable(eB, coef = 1)
ADD COMMENT
0
Entering edit mode

Do you use Mac OS, because even when i run this lmfit is not working. Could this be a Mac issue. Im going to try to download the data again.

EDIT: I still get the same error

setwd("/Users/ashaypatel/Desktop/Research/GSE5281") #Set Working directory
celpath<-"~/Desktop/Research/GSE5281" #Set path to cel files
CELFiles <- list.files(celpath,full.names=TRUE, pattern = "CEL")
RawData <- oligo::read.celfiles(CELFiles)
PreProcessedData<- oligo::rma(RawData)


condition <- data.frame(Group=c(rep("control", 100), rep("treatment", 61)))
design <- model.matrix(~Group, data = condition)
fit <- lmFit(PreProcessedData,design)

I get the error:

> fit <- lmFit(PreProcessedData,design)

Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be an array of at least two dimensions

ADD REPLY
0
Entering edit mode

Same error!?- that is bizarre. You have tried this in a fresh R session? Also avoid RStudio if you can - I never use it for analyses.

I presume that you can obtain the data like this:

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO("GSE5281", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
ADD REPLY
0
Entering edit mode

What do you use instead of RStudio?, I did try a fresh r ression. I'll try restarting my computer

ADD REPLY
0
Entering edit mode

RStudio is totally fine, so is Mac. There might be a few situations where RStudio might be undesirable, e.g. in the context of running forked processes, but this is not relevant here.

Yes, the example I showed was done on a Mac (10.14, R 4.0.2), there is something odd going on on your end. I suggest you reinstall the relevant packages, redownload the data and try again.

ADD REPLY
0
Entering edit mode

I use the standard R environment, on Ubuntu 16.04.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6