Question: what machine learning algorithms are good for mining different types of data simultaneously?
gravatar for zhen.qin13
18 months ago by
zhen.qin130 wrote:

I'm using python scikit-learn package so any demonstration using scikit learn function will be really appreciated :)

Now I have several types of biomedical data: Clinical data, DNA methylation data, miRNA and RNA expression data. Each data type contains roughly 300 patient samples and 50-ish normal (control) samples. I want to use several machine learning algorithm to input these data together, and training a model so that it can predict the survival of a patient based on data given. Now I have some important questions:
1. Since the size of the samples are very different, how can I group these data and feed the algorithm? For instance, if do clustering, how can I align them?
2. There are many probs for methylation, miRNA and RNA, over a thousand for each. Is there a way to filter out the important features(probs) and only train the model based on these data? Or even better, after training the model using all the data, can the model tell me which features are important among large amounts of features? Is scikit-learn preprocessing method enough to do this step?
3. Is there a way to combine several algorithms together? For instance, using clustering to classify all features, and then input the results in random forest/PCA algorithms together get the model? I haven't learned machine learning systematically, so I got really confused when trying to use them. I think I should use unsupervised algorithms. Is that correct?

rna-seq • 499 views
ADD COMMENTlink modified 18 months ago by genomax65k • written 18 months ago by zhen.qin130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1459 users visited in the last hour