Question: Error in goodSamplesGenes function
0
gravatar for sh.o.94
10 weeks ago by
sh.o.940
sh.o.940 wrote:

Hi Dear

I'm using WGCNA package to build a co-regulation network from my microarray data. I'm following the WGCNA tutorial but I have a error message when I try to run the goodSamplesGenes. Could anyone help me?

gsg = goodSamplesGenes((datExpr), verbose = 3);

Flagging genes and samples with too many missing values...
  ..step 1

Error in goodGenes(datExpr, weights, goodSamples, goodGenes, minFraction = minFraction,  : 
  datExpr must contain numeric data.
R wgcna • 204 views
ADD COMMENTlink modified 5 weeks ago by AndiN0 • written 10 weeks ago by sh.o.940

Error is clear: "datExpr must contain numeric data"

ADD REPLYlink written 10 weeks ago by zx87549.1k

Hello,

Did you solve it?

I have the same problem, goodSampleGenes does not read the numbers, but when continuing to the sample clustering it returns results!

I do not really know the mistake I am doing. I checked with srt() function and there is numeric data.

My commands are:

> options(stringsAsFactors = FALSE);

# Read in the expression data set already transposed (Genes in columns,
# samples in rows)

> ExpData = read.csv("Expresion_Mackay_final_.csv")

> dim(ExpData); #This returns 15257 genes (columns) and 221 samples (rows)

> datExp0 = as.data.frame(ExpData);
> fix(datExp0)

# Checking data for excessive missing values and identification of outlier microarray samples

> gsg = goodSamplesGenes(datExp0, verbose =1);

> gsg$allOK

Error in goodGenes(datExpr, weights, goodSamples, goodGenes,
minFraction = minFraction,  :    datExpr must contain numeric data.

Thanks for your help.

ADD REPLYlink modified 7 weeks ago by RamRS26k • written 7 weeks ago by annape930

This should be a comment on the post, not an "answer", as you're not really answering sh.o.94's question. I'm moving it to a comment now, but please be more careful in the future.

By the way, did you look at zx8754's pointer? It should help you get to the solution.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by RamRS26k
0
gravatar for AndiN
5 weeks ago by
AndiN0
AndiN0 wrote:

Hi, had the same problem recently.

You can solve it by converting your datExpr to a data matrix, which will be correctly processed

datExpr = data.matrix(datExpr)

gsg = goodSamplesGenes((datExpr), verbose = 3)

should run then

ADD COMMENTlink modified 5 weeks ago by genomax80k • written 5 weeks ago by AndiN0

That won't always work. The data needs to be numeric and data.matrix is not strict enough to satisfy that. In all probability, your datExpr is a data.frame that has factors or logical columns which data.matrix converts to numeric type, but does not really have a non-convertible column. zx8754's answer that points OP to ensure all data is numeric is the right way to go.

See sample code that shows why data.matrix won't work:

x <- c("A","B","C")
x_fac <- factor(x, levels = c("B","A","C"), ordered = TRUE)

################

df_fac <- data.frame(col1=c(1,2,3), col2=x_fac, col3=c(TRUE,FALSE,TRUE), stringsAsFactors = FALSE) #col2 is a factor here
df_fac

  col1 col2  col3
1    1    A  TRUE
2    2    B FALSE
3    3    C  TRUE

data.matrix(df_fac)

     col1 col2 col3
[1,]    1   NA    1
[2,]    2   NA    0
[3,]    3   NA    1
Warning message:
In data.matrix(data.frame(col1 = c(1, 2, 3), col2 = x, col3 = c(TRUE,  :
  NAs introduced by coercion

################

df_nonfac <- data.frame(col1 = c(1,2,3), col2 = x, col3 = c(TRUE,FALSE,TRUE), stringsAsFactors = FALSE) #col2 is not a factor here
  col1 col2  col3
1    1    A  TRUE
2    2    B FALSE
3    3    C  TRUE

data.matrix(df_nonfac)
     col1 col2 col3
[1,]    1    2    1
[2,]    2    1    0
[3,]    3    3    1

See how it works perfectly when columns are numeric, logical or factor but not otherwise? The trick is to handle non-numeric columns, not use a data.matrix blindly.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by RamRS26k

Did you ever use the WGCNA package? The problem just arose recently with the newest version of R. I suppose something changed in the way R handles expression matrices. These are virtually always numeric data, with row (gene) identifiers and samples as columns. So there are no factors or logic or similar columns. I totally agree with you that one should be careful when using data.matrix, but the error the OP mentioned has nothing to do with the data not being numeric, but the combination of R and function (goodGenes).

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by AndiN0

I suppose something changed in the way R handles expression matrices

What exactly changed? Unless we're able to define what broke, a higher level description than "it needs all numeric data" cannot be made. "Use data.matrix" might just be a temporary workaround for all we know. I'm not saying it doesn't fix the problem, I'm saying we don't know what the problem is and how a data.matrix fixes the problem.

the error the OP mentioned has nothing to do with the data not being numeric

The error message says "datExpr must contain numeric data", so I think the error message disagrees with your interpretation of itself.

ADD REPLYlink written 5 weeks ago by RamRS26k

Alright, so my bad, I just checked...

The following happened to me, maybe that will help the OP:

After normalisation, I wrote my data matrix into an excel sheet with openxlsx (for visualisation with other programs).

Importing the same file again into R, also via openxlsx) and running the 'goodGenes' or 'goodSampleGenes' will throw the mentioned error.

Stupid thing is, exporting a data.frame from R into Excel leads to Excel not recognising the numeric bits as numbers.

Now, if you convert in Excel the numeric part to numbers again, save it, everything runs fine...

It is strange, because in R the imported data.frame looked perfectly fine, numbers were numbers and not characters and such.

So sorry, if I caused confusion there. But again, maybe a similar thing happened to the OP.

ADD REPLYlink written 5 weeks ago by AndiN0

Excel

There's your problem. Write to and read from CSV/TSV files. Save a copy of the CSV/TSV as an Excel file manually if required. Plain text format is your best friend.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by RamRS26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour