I am trying to do bioinformatics work for the first time with this competition. I am completely lost as to how to proceed and I was hoping someone could give me some guidance and tell me if I am even thinking about this problem the correct way or how they might approach this problem.
We are given: ~300 patients who all have multiple myeloma. There is a (sparse) ground-truth table where each patient has the time (in days) that they are observed for, and a progression time (if they progress since being entered into the study). For each patient we are also given information on ~19,000 genes: average gene expression, difference in expression when measured in the tumor/not in a tumor, and a boolean representation of whether or not the gene is mutated.
The goal is: to create a model that can be trained on a set of patient data which can take another set of patient genomic data and predict the progression time for each patient (then rank-order them).
My thoughts: This sounds like a machine learning problem, where I'd use some regression kind of thing. But when I look up publications that model disease progression, they seem to estimate something called a survival function from a Kaplan-Meier estimator. Kaplan-Meier estimators seem to be used specifically for looking at drug efficacies, so maybe not relevant here, but the survival function seems to fit into what the problem is asking me to do.
- These both seem like valid approaches — do I just choose one, or am I misunderstanding one (or both) of them?
- I have been trying to find publications that do something similar to read their methodology — does anyone know of some that they can share?
- What kind of problem is this? Obviously a bioinformatics one, but I was wondering if there is a more specific classification or term that I could look up in a textbook.
Any guidance or literature would be appreciated! Thank you guys so much.
Edit: This caught my eye and I am reading through the series from the start. I am not sure on how machine learning data can be related to survival analysis (for some reason I imagined those two to be exclusive), but I think that will be answered here. Is this what I should be reading?
Edit: I just realized this may come across like I'm trying to capitalize on your guys' experience to try and snag a prize reward. To be eligible for prizes, I'd need to have this done by the end of the week — I've never made a model for anything before, and I don't anticipate being able to finish this in a month, let alone a week. I am just trying to learn.