Question: Control-disease pair study design
gravatar for banerjeeshayantan
14 months ago by
banerjeeshayantan170 wrote:

I have a clinical dataset for a particular disease. I have the mortality variable as well in my feature set. I am building machine learning models to predict mortality using the other clinical features (38 in total and sample size 276). The features are mostly categorical. I have three disease stages listed: - Control(47 samples), disease(52 samples), non-disease(176 samples). The values of the clinical features for the control set is all "n/a", meaning no information was collected for the control cases. Is it wise therefore to consider the "control" cases as "non-disease" and also consider them as living (mortality -yes)? By doing that I will gain in sample size for the model building purpose. The missing values will be imputed using some technique.
So is it a right approach to do? Am I introducing bias in the model by doing imputation for so many categorical features for the control cases?

R • 213 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by banerjeeshayantan170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour