Question

cv.glmnet() gives very close cvm value

0

Entering edit mode

5.0 years ago

MatthewP ★ 1.4k

Hello, I want to filter some survival marker genes. My original data is gene expression matrix(DESeq2 vst value) with 400+ samples from GDC. First I filter hundreds genes I need to perform Univariate Cox regression and filter genes with p.adj < 0.05, this gives me 12 genes list.

Then I want to filter those important gene use lasso regression, I use cv.glmnet from R package glmnet first to get a suitable lambda value. But my cvm ( mean cross-validated error) values very close to each other, this seems not normal? I want to find out what's the causes maybe.

My data

> head(x, n = 2)
            1        2        3        4        5        6        7        8
[1,] 10.88470 8.885131 4.867323 14.52633 10.08623 11.18796 14.55457 11.65622
[2,] 10.09428 9.636136 5.686320 14.67810 12.02149 11.20055 14.33664 11.72159
            9       10       11       12
[1,] 7.880675 12.98104 10.91308 11.93209
[2,] 6.199323 13.46840 12.44164 11.19995
> head(y, n =2)
      time status
[1,] 11.96      2
[2,] 94.81      1

My code

> fitK <- cv.glmnet(x, y, family = "cox")
> fitK$lambda.min
[1] 0.0770058
 > fitK$lambda.1se
[1] 0.2580929
# CVM value very close
> fitK$cvm
 [1] 12.07336 12.07297 12.07083 12.06708 12.06198 12.05678 12.05164 12.04613
 [9] 12.04088 12.03668 12.03342 12.03090 12.02914 12.02889 12.02920 12.02963
[17] 12.03019 12.03106 12.03233 12.03391 12.03593 12.03804 12.04021 12.04218
[25] 12.04406 12.04596 12.04777 12.04947 12.05133 12.05316 12.05486 12.05626
[33] 12.05751 12.05866 12.05969 12.06061 12.06147 12.06225 12.06296 12.06363
[41] 12.06428 12.06486 12.06548 12.06596 12.06638 12.06685 12.06717 12.06756
[49] 12.06770 12.06774

Correlation between genes, this seems not very big.

> cor(x) %>% head()
            1           2           3          4            5            6
1  1.00000000  0.05648318  0.30065224  0.2710753  0.119531476 -0.180099805
2  0.05648318  1.00000000  0.13410949  0.3248191  0.199305569 -0.223570929
3  0.30065224  0.13410949  1.00000000  0.2575741  0.038419909 -0.459850656
4  0.27107531  0.32481905  0.25757411  1.0000000  0.170796057 -0.388097210
5  0.11953148  0.19930557  0.03841991  0.1707961  1.000000000 -0.007447084
6 -0.18009980 -0.22357093 -0.45985066 -0.3880972 -0.007447084  1.000000000
           7           8            9          10         11          12
1  0.2700544  0.11141581  0.338836839  0.29913321 -0.2236379  0.07780468
2  0.3650802  0.30449755  0.141082364  0.31036554 -0.2884198  0.24941497
3  0.1788922  0.20880124  0.703237267  0.03778170 -0.4594414  0.35212838
4  0.4799563  0.08034701  0.242888783  0.41879070 -0.4048819  0.23810640
5  0.1208836 -0.08956863 -0.008660917  0.12318861 -0.1364541 -0.06475891
6 -0.2315397 -0.14637268 -0.414746954 -0.07319706  0.4812620 -0.25343412

glmnet coef result

> coef(fit, fitK$lambda.min)
12 x 1 sparse Matrix of class "dgCMatrix"
            1
1  .         
2  0.14001292
3  .         
4  0.23618474
5  0.11402239
6  .         
7  .         
8  0.18135430
9  .         
10 .         
11 .         
12 0.06338452
> coef(fit, fitK$lambda.1se)
12 x 1 sparse Matrix of class "dgCMatrix"
   1
1  .
2  .
3  .
4  .
5  .
6  .
7  .
8  .
9  .
10 .
11 .
12 .

Thanks

lasso glmnet survival • 1.7k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 5.0 years ago by MatthewP ★ 1.4k

1

Entering edit mode

Cross-validation explores a range of values for the regularization coefficient lambda. A small range of cv errors suggests that the model is relatively insensitive to the choice of lambda. There may be nothing untoward here but the behaviour could also be caused for example by outliers in the data. On the other hand, you seem to have already selected genes so there may not be much of a model to learn to start with.

Start with model <- glmnet(...) and to get an idea of variables (genes) that are more important for the model, plot evolution of the coefficients as a function of regularization, i.e. plot(model). A small L1 norm means a strong regularization. As regularization is relaxed (increase in the L1 norm), more variables enter the model. Therefore variables entering the model early can be considered more important.

ADD REPLY • link 5.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes I first filter genes according to the result of Univariate Cox regression. Thanks.

ADD REPLY • link 5.0 years ago by MatthewP ★ 1.4k

0

Entering edit mode

Here is plot(fitK)
image link

ADD REPLY • link 5.0 years ago by MatthewP ★ 1.4k