I am working on a prediction problem that leverage sparse clinical datasets.
Missing data rate is in the range of 80%.
- I am wondering if there is any example of application of matrix completion to clinical or other datasets with such a missing rates.
- Currenlty exploring glmnet, pcaMethods and SoftImpute pacakages. I am also looking for R packages/SAS routines that can handle such sparse clinical data matrix and perform matrix completion.
- I would like assess the reliability of my filled-in values, is there any metric or score to assess the quality of the matrix completion.
Thanks in advance!
PS. Cross-posted from here. I solved this problem using a method from pcaMethods; posting it here to get thoughts from biomedical / healthcare datasciencey folks here.