4 months ago by
Republic of Ireland
You should research the area of Natural Language Processing (NLP). As an example, colleagues in Boston, USA, are building predictive models from doctors' notes, with patient diagnoses as the endpoints. The inputs to these models are just key terms that are extracted from the notes. Other things are also obviously going into these models.
You are likely heavily restricting yourself with just Python and JAVA, in this case, as SAS, STATA, and R are 'better' statistical languages to use for what you want to do.
Also keep in mind that Machine Learning is, like, a big flop, as most other terms that come and go in bioinformatics and biostatistics. People in favour of these will cite isolated examples of success, though (somewhat like network analysis). From my practical experiences and those of others, standard regression models can build better predictors.