Control-disease pair study design
0
0
Entering edit mode
4.7 years ago
Gene_MMP8 ▴ 240

I have a clinical dataset for a particular disease. I have the mortality variable as well in my feature set. I am building machine learning models to predict mortality using the other clinical features (38 in total and sample size 276). The features are mostly categorical. I have three disease stages listed: - Control(47 samples), disease(52 samples), non-disease(176 samples). The values of the clinical features for the control set is all "n/a", meaning no information was collected for the control cases. Is it wise therefore to consider the "control" cases as "non-disease" and also consider them as living (mortality -yes)? By doing that I will gain in sample size for the model building purpose. The missing values will be imputed using some technique.
So is it a right approach to do? Am I introducing bias in the model by doing imputation for so many categorical features for the control cases?

R • 565 views
ADD COMMENT

Login before adding your answer.

Traffic: 1605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6