Question

Patient data analysis ( numerical and categorical values) in R

0

Entering edit mode

4.6 years ago

jansha.1997 • 0

Hey everyone. I have a dataset comprising of patient data and their biochemical test results. I have numerical variables and categorical variables like YES/No. I want to run numerical tests like Annova and others into this data set. But I also have categorical variables with values like ( Mother, father, both and none). I have used boolean values to imply the Yes/No values but I don't know what to do with categorical variables with more than binary values and more than 2 values. And I am confused on how to run numerical tests for those? Please help me out.

R data analysis biostatistics • 987 views

ADD COMMENT • link 4.6 years ago by jansha.1997 • 0

0

Entering edit mode

What question are you trying to answer? Running some sort of regression on the data using all the predictor variables might be a better approach.

ADD REPLY • link 4.6 years ago by rpolicastro 13k

0

Entering edit mode

The tests are basically to find the difference between 2 cohorts. And I have made 2 subsets of each cohort from the whole data but I don't know how to recode the categorical variables with more than 3 types of values to numericals. For eg: Normal and abnormal I could recode by 0 and 1 but values like mother, father, both, I don't know how to recode them to numerical to run the test.

ADD REPLY • link 4.6 years ago by jansha.1997 • 0

1

Entering edit mode

If you only have two cohorts you could try a logistic regression that includes all of your variables. In R you can leave categorical values as characters or an ordered factor.

Having a unified regression model is generally better anyway, because it tests for each variable while holding the other variables constant.

ADD REPLY • link 4.6 years ago by rpolicastro 13k