Question

High correlation and classification

1

Entering edit mode

8.9 years ago

marco ▴ 10

Hi everyone,

I've a matrix of data taked by GSE5281.

I would like to take a classification, in my specific case with a ANN (i know that is not a good alghoritm for gene's data).

My problem is that the data is everything with a high correlation:

sum(correlation>0.7);sum(correlation<0.7)
[1] 25599
[1] 322

sum(correlation>0.8);sum(correlation<0.8)
[1] 25379
[1] 542

With this cor the ANN don't work. How algorithm I can use to perform a ANN? There is some code example about a ANN with GSE dataset?

Thank you :)

R • 5.9k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by marco ▴ 10

1

Entering edit mode

What is "everything"? Have you normalized or log-transformed the data? Scaled in any way? What measure of correlation are you using? It is worth thinking about these details and how they impact the correlation that you are seeing.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by Sean Davis 26k

Ram · Answer 1 · 2015-06-03

If you are trying to create a classifier for this data set please take a look at PAM (Prediction Analysis for Microarrays) http://statweb.stanford.edu/~tibs/PAM/, as it was designed for microarray data and is a good algorithm. However, the questions posed by @Sean Davis are very important and must be addressed/understood before passing your data to any classification algorithm.

Ram · Answer 2 · 2015-06-03

0

Entering edit mode

8.9 years ago

marco ▴ 10

I read some paper with your indication and I transform my data filtring high cv and apply a log-transform. Thanks so mouch for that! :)

Now, when I lunch a ANN (using caret) it give me back a error message:

##TRAIN NN (nnet)
library(nnet);

nnFit <- train(target ~ ., data = trainT,
+                method = "nnet",
+                trControl = fitControl,
+                #trControl = ctrl, metric = "ROC",
+                verbose = TRUE#,
+                #tuneGrid = nnGrid
+ )
Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa
 Min.   : NA   Min.   : NA
 1st Qu.: NA   1st Qu.: NA
 Median : NA   Median : NA
 Mean   :NaN   Mean   :NaN
 3rd Qu.: NA   3rd Qu.: NA
 Max.   : NA   Max.   : NA
 NA's   :9     NA's   :9
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

Can you help me with that? :)

Thanks so much

Edit: I'm using caret package

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by marco ▴ 10

0

Entering edit mode

marco, please add your followup questions and responses as comments or replies to comments and not as an answer to your own question. It is not clear who you are thanking or who you are asking for followup advice from unless this post is associated with a previous answer or comment. With regards to your problem, my first thought is that something is wrong with your input data. Go back and read the instructions for the tool you are using and check your input data to ensure there are no missing values--zeros may also be an issue, but that depends on the calculations the program is using, so make sure you are following the instructions from the tool. Maybe even try their example data to ensure you are using it correctly before moving on to your data.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by alolex ▴ 950

0

Entering edit mode

Oh, I'm so sorry! I use this like a forum but it is not :(

I try to use a little part of my data and it work, but if I use all my data (or a big part of it) it give me back the same error.

I'm thinking that my problem is about the weight, but I don't know how I can resolve it.

If I manually stop the nnet it give me back this:

nnFit <- train(target ~ ., data = trainT,
+                method = "nnet",
+                trControl = fitControl#,
+                #trControl = ctrl, metric = "ROC",
+                #verbose = TRUE#,
+                #tuneGrid = nnGrid
+ )

Warning messages:
1: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: size=1, decay=0e+00 Error in nnet.default(x, y, w, entropy = TRUE, ...) :
  too many (3011) weights

2: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: size=3, decay=0e+00 Error in nnet.default(x, y, w, entropy = TRUE, ...) :
  too many (9031) weights

3: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: size=5, decay=0e+00 Error in nnet.default(x, y, w, entropy = TRUE, ...) :
  too many (15051) weights

4: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: size=1, decay=1e-01 Error in nnet.default(x, y, w, entropy = TRUE, ...) :
  too many (3011) weights

5: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: size=3, decay=1e-01 Error in nnet.default(x, y, w, entropy = TRUE, ...) :
  too many (9031) weights

I need to increase the weight or are some problem in the dataset?

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by marco ▴ 10

0

Entering edit mode

See if the suggestions on StackOverflow helps. I found this by typing "R nnet too many weights" into Google. I don't use nnet, so if it is an parameter adjustment issue I won't be able to help. If you can't find the answer on StackOverflow already posted, then try and post this question there as it sounds like it may be a parameter issue. Hope you can figure it out :)

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by alolex ▴ 950