Question

Error when looping over multiple columns in a data frame in R

1

Entering edit mode

9 months ago

Mohamed Samir ▴ 30

I am trying to obtain certain cutoff value in multiple variables (columns 3: 69) of a data frame (name = Data). This is how data looks like The data What I wanted is to loop over the columns 3:69 (i.e. their values/numbers of course), to obtain the optimal cutoff value that could discriminate between stauses (Lables) of patients using the Youden method. This is the code I am using:

for (i in 1:ncol(Data)) {optimal.cutpoints(X =i, status = 'Label', 
                                         tag.health = 'Control',
                                         method = 'Youden', data = Data)}

I am not sure if i in 1:ncol would be the right thing to put in here ? When I run the code the following error appear

Error: Not all needed variables are supplied in 'data'.

I do understand that I R did not find the variables within the defined df (Data). but how can I formulate the for in loop to allow R applying the cutoff function for all columns at once especially given that the cutoff needs an X valu, which I assume should be the predefined i ?

Thanks

Statistics R • 861 views

ADD COMMENT • link updated 9 months ago by Jeremy ▴ 930 • written 9 months ago by Mohamed Samir ▴ 30

score 0 · Answer 1 · 2024-05-22

0

Entering edit mode

9 months ago

Jeremy ▴ 930

First of all, you should use "tag.healthy" and "methods" instead of "tag.health" and "method". Also, optimal.cutpoints() wants X to be either a character string or a formula. Right now, you're trying to feed it i, which is a number. Try the following code:

First, initialize a list to store the results.

cutpoint_results <- list()

Then loop through each column and apply optimal.cutpoints().

for (col_name in colnames(Data[3:ncol(Data)])) {
 cutpoint_results[[col_name]] <- optimal.cutpoints(
X = col_name,
 status = "Label", 
tag.healthy = "Control",
 methods = "Youden",
data = Data)
}

Actually accessing the cutoff values takes a little digging:

cut.list = list()

for(k in seq_along(cutpoint_results)){
cut.list[k] = cutpoint_results[[k]][['Youden']][['Global']][['optimal.cutoff']][['cutoff']]
}

names(cut.list) = names(cutpoint_results)

ADD COMMENT • link 9 months ago by Jeremy ▴ 930

0

Entering edit mode

Dear Jeremy, Thanks. What I could not understand why you wrote it like : cutpoint_results[[col_name]] ? Is it because it is a list results ? Also if I have a list of those cutoff (cut.list), how can I download it as an excel table ?

Thanks

ADD REPLY • link 9 months ago by Mohamed Samir ▴ 30

0

Entering edit mode

cutpoint_results[[col_name]] adds each new cut point to the list entitled "cutpoint_results", while keeping the original column names from the Data dataframe. To output a file that can be opened in Excel, you can use as.data.frame(), followed by write.csv(). I always like to set row.names to "F" for the latter.

ADD REPLY • link 9 months ago by Jeremy ▴ 930