Question: Control-disease pair study design
gravatar for banerjeeshayantan
17 months ago by
banerjeeshayantan190 wrote:

I have a clinical dataset for a particular disease. I have the mortality variable as well in my feature set. I am building machine learning models to predict mortality using the other clinical features (38 in total and sample size 276). The features are mostly categorical. I have three disease stages listed: - Control(47 samples), disease(52 samples), non-disease(176 samples). The values of the clinical features for the control set is all "n/a", meaning no information was collected for the control cases. Is it wise therefore to consider the "control" cases as "non-disease" and also consider them as living (mortality -yes)? By doing that I will gain in sample size for the model building purpose. The missing values will be imputed using some technique.
So is it a right approach to do? Am I introducing bias in the model by doing imputation for so many categorical features for the control cases?

R • 240 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by banerjeeshayantan190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2023 users visited in the last hour