Error in goodSamplesGenes function
1
0
Entering edit mode
4.3 years ago
sh ▴ 10

Hi Dear

I'm using WGCNA package to build a co-regulation network from my microarray data. I'm following the WGCNA tutorial but I have a error message when I try to run the goodSamplesGenes. Could anyone help me?

gsg = goodSamplesGenes((datExpr), verbose = 3);

Flagging genes and samples with too many missing values...
  ..step 1

Error in goodGenes(datExpr, weights, goodSamples, goodGenes, minFraction = minFraction,  : 
  datExpr must contain numeric data.
WGCNA R • 5.0k views
ADD COMMENT
0
Entering edit mode

Error is clear: "datExpr must contain numeric data"

ADD REPLY
0
Entering edit mode

Hello,

Did you solve it?

I have the same problem, goodSampleGenes does not read the numbers, but when continuing to the sample clustering it returns results!

I do not really know the mistake I am doing. I checked with srt() function and there is numeric data.

My commands are:

> options(stringsAsFactors = FALSE);

# Read in the expression data set already transposed (Genes in columns,
# samples in rows)

> ExpData = read.csv("Expresion_Mackay_final_.csv")

> dim(ExpData); #This returns 15257 genes (columns) and 221 samples (rows)

> datExp0 = as.data.frame(ExpData);
> fix(datExp0)

# Checking data for excessive missing values and identification of outlier microarray samples

> gsg = goodSamplesGenes(datExp0, verbose =1);

> gsg$allOK

Error in goodGenes(datExpr, weights, goodSamples, goodGenes,
minFraction = minFraction,  :    datExpr must contain numeric data.

Thanks for your help.

ADD REPLY
0
Entering edit mode

This should be a comment on the post, not an "answer", as you're not really answering sh.o.94's question. I'm moving it to a comment now, but please be more careful in the future.

By the way, did you look at zx8754's pointer? It should help you get to the solution.

ADD REPLY
3
Entering edit mode
4.2 years ago
AndiN ▴ 30

Hi, had the same problem recently.

You can solve it by converting your datExpr to a data matrix, which will be correctly processed

datExpr = data.matrix(datExpr)

gsg = goodSamplesGenes((datExpr), verbose = 3)

should run then

ADD COMMENT
0
Entering edit mode

That won't always work. The data needs to be numeric and data.matrix is not strict enough to satisfy that. In all probability, your datExpr is a data.frame that has factors or logical columns which data.matrix converts to numeric type, but does not really have a non-convertible column. zx8754's answer that points OP to ensure all data is numeric is the right way to go.

See sample code that shows why data.matrix won't work:

x <- c("A","B","C")
x_fac <- factor(x, levels = c("B","A","C"), ordered = TRUE)

################

df_fac <- data.frame(col1=c(1,2,3), col2=x_fac, col3=c(TRUE,FALSE,TRUE), stringsAsFactors = FALSE) #col2 is a factor here
df_fac

  col1 col2  col3
1    1    A  TRUE
2    2    B FALSE
3    3    C  TRUE

data.matrix(df_fac)

     col1 col2 col3
[1,]    1   NA    1
[2,]    2   NA    0
[3,]    3   NA    1
Warning message:
In data.matrix(data.frame(col1 = c(1, 2, 3), col2 = x, col3 = c(TRUE,  :
  NAs introduced by coercion

################

df_nonfac <- data.frame(col1 = c(1,2,3), col2 = x, col3 = c(TRUE,FALSE,TRUE), stringsAsFactors = FALSE) #col2 is not a factor here
  col1 col2  col3
1    1    A  TRUE
2    2    B FALSE
3    3    C  TRUE

data.matrix(df_nonfac)
     col1 col2 col3
[1,]    1    2    1
[2,]    2    1    0
[3,]    3    3    1

See how it works perfectly when columns are numeric, logical or factor but not otherwise? The trick is to handle non-numeric columns, not use a data.matrix blindly.

ADD REPLY
0
Entering edit mode

Did you ever use the WGCNA package? The problem just arose recently with the newest version of R. I suppose something changed in the way R handles expression matrices. These are virtually always numeric data, with row (gene) identifiers and samples as columns. So there are no factors or logic or similar columns. I totally agree with you that one should be careful when using data.matrix, but the error the OP mentioned has nothing to do with the data not being numeric, but the combination of R and function (goodGenes).

ADD REPLY
0
Entering edit mode

I suppose something changed in the way R handles expression matrices

What exactly changed? Unless we're able to define what broke, a higher level description than "it needs all numeric data" cannot be made. "Use data.matrix" might just be a temporary workaround for all we know. I'm not saying it doesn't fix the problem, I'm saying we don't know what the problem is and how a data.matrix fixes the problem.

the error the OP mentioned has nothing to do with the data not being numeric

The error message says "datExpr must contain numeric data", so I think the error message disagrees with your interpretation of itself.

ADD REPLY
0
Entering edit mode

Alright, so my bad, I just checked...

The following happened to me, maybe that will help the OP:

After normalisation, I wrote my data matrix into an excel sheet with openxlsx (for visualisation with other programs).

Importing the same file again into R, also via openxlsx) and running the 'goodGenes' or 'goodSampleGenes' will throw the mentioned error.

Stupid thing is, exporting a data.frame from R into Excel leads to Excel not recognising the numeric bits as numbers.

Now, if you convert in Excel the numeric part to numbers again, save it, everything runs fine...

It is strange, because in R the imported data.frame looked perfectly fine, numbers were numbers and not characters and such.

So sorry, if I caused confusion there. But again, maybe a similar thing happened to the OP.

ADD REPLY
0
Entering edit mode

Excel

There's your problem. Write to and read from CSV/TSV files. Save a copy of the CSV/TSV as an Excel file manually if required. Plain text format is your best friend.

ADD REPLY

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6