Question

WGCNA negative values

1

Entering edit mode

3.3 years ago

reara ▴ 30

I am getting a "list" error when I try to input my values in WGCNA. I have FPKM values in my expression matrix some of which are negative (with a negative sign in front). Unsure of how to resolve this issue. Any help is appreciated.

R rna-seq genome sequencing next-gen • 1.6k views

ADD COMMENT • link updated 3.3 years ago by Kevin Blighe 87k • written 3.3 years ago by reara ▴ 30

Kevin Blighe · Answer 1 · 2021-01-19

1

Entering edit mode

3.3 years ago

Kevin Blighe 87k

A negative FPKM value makes no sense, so, there is something wrong with your data processing steps prior to WGCNA. Perhaps you have [erroneously] tried to run ComBat on your FPKM data to correct for one or more batch effects that you perceive exist(s) in your data?

Kevin

ADD COMMENT • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry just to clarify-these are post normalization values. Do i take it that WGCNA cannot handle negative values?

ADD REPLY • link 3.3 years ago by reara ▴ 30

0

Entering edit mode

It can handle negative values, but, what I was implying was that there is perhaps something else 'wrong' with your data based on the fact that a negative FPKM expression value makes no sense, unless these are logged FPKM expression levels.

The error itself also alludes to a data structure issue. Can you show your code that produces that error, and also the output of the str() function run on your input expression matrix?

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, I see what you mean. Here is what str() gave:

data.frame':    2612 obs. of  3228 variables:
 $ V1   : Factor w/ 2445 levels " 0.000302164",..: 2445 2257 2068 2183 2204 1494 148 1261 290 2166 ...
  ..- attr(*, "names")= chr  "X" "X10010J" "X10025W" "X10052Z" ...
 $ V2   : Factor w/ 1765 levels " 0.002960904",..: 1765 750 275 1605 1298 1081 1241 455 620 1241 ...

(However even the default WGCNA data has MMT00000044 so im not sure what is happening with my data)

Here is a photo of what my expression matrix actually looks like-

https://ibb.co/v3KdP2K

ADD REPLY • link updated 3.3 years ago by Kevin Blighe 87k • written 3.3 years ago by reara ▴ 30

0

Entering edit mode

Well, there may be the problem. Your data frame is encoded categorically, i.e., as factors. You need to convert it to a data matrix or to keep it as a data frame but with everything encoded numerically.

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

So i tried to convert it to a numeric dataframe but this is what i got-

datExpr0 = as.data.frame(t(femData))

datExpr0 = data.matrix(datExpr0)

There were 50 or more warnings (use warnings() to see the first 50)

str(datExpr0)

num [1:2612, 1:3228] NA -0.482 -0.188 -0.371 -0.401 ...

attr(*, "dimnames")=List of 2

..$ : chr [1:2612] "X" "X10010J" "X10025W" "X10052Z" ...

..$ : chr [1:3228] "V1" "V2" "V3" "V4" ...

also how do i get rid of the V1, V2 that R automatically seems to insert when making a dataframe?

ADD REPLY • link 3.2 years ago by reara ▴ 30

0

Entering edit mode

Evidently, your object, femData, contains data that is non-numerical. You need to remove these.

Can you please confirm that you have first completed the WGCNA tutorial? Which part of the tutorial is this?

ADD REPLY • link 3.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, I have completed the tutorial. This is the part im having trouble with:

datExpr0 = as.data.frame(t(femData[, -c(1:8)]));

names(datExpr0) = femData$substanceBXH;

rownames(datExpr0) = names(femData)[-c(1:8)];

ADD REPLY • link 3.2 years ago by reara ▴ 30

0

Entering edit mode

I see, but, if you look at the tutorial code, columns 1 to 8 are being removed via -c(1:8). These are likely non-numerical columns.

ADD REPLY • link 3.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes the issue appears to be that headers (V1, V2...) which get added when you make a dataframe are causing the issue as they then make the gene IDs a non-numeric component of the df itself. I was just trying out different ways to do this, but it appears the tutorial is the only/best way to subvert this issue.

ADD REPLY • link 3.2 years ago by reara ▴ 30

0

Entering edit mode

I have a pheno/triat file with only the fields i need, but still when I run the datTraits im getting NA values in my table-could this be a similar issue as above?

>traitData = read.csv("pheno_tmm_lc_cbc_subset_freeze3_reqd_fields.csv");
dim(traitData)
names(traitData)

--remove columns that hold information we do not need.

>allTraits = traitData;
allTraits = allTraits[,];
dim(allTraits)
names(allTraits)

--Form a data frame analogous to expression data that will hold the clinical traits.

>femaleSamples = rownames(datExpr0);
traitRows = match(femaleSamples, allTraits$sid);
datTraits = allTraits[traitRows, -1];
rownames(datTraits) = allTraits[traitRows, 1];

ADD REPLY • link updated 3.2 years ago by Kevin Blighe 87k • written 3.2 years ago by reara ▴ 30

0

Entering edit mode

Without seeing input and output for each step, I am limited in what I can do. All that I can say is to be sure that your input data has the same format as that used by the tutorial, i.e., to avoid issues elsewhere throughout the tutorial itself

ADD REPLY • link 3.2 years ago by Kevin Blighe 87k