Question: what machine learning algorithms are good for mining different types of data simultaneously?
gravatar for zhen.qin13
3.0 years ago by
zhen.qin130 wrote:

I'm using python scikit-learn package so any demonstration using scikit learn function will be really appreciated :)

Now I have several types of biomedical data: Clinical data, DNA methylation data, miRNA and RNA expression data. Each data type contains roughly 300 patient samples and 50-ish normal (control) samples. I want to use several machine learning algorithm to input these data together, and training a model so that it can predict the survival of a patient based on data given. Now I have some important questions:
1. Since the size of the samples are very different, how can I group these data and feed the algorithm? For instance, if do clustering, how can I align them?
2. There are many probs for methylation, miRNA and RNA, over a thousand for each. Is there a way to filter out the important features(probs) and only train the model based on these data? Or even better, after training the model using all the data, can the model tell me which features are important among large amounts of features? Is scikit-learn preprocessing method enough to do this step?
3. Is there a way to combine several algorithms together? For instance, using clustering to classify all features, and then input the results in random forest/PCA algorithms together get the model? I haven't learned machine learning systematically, so I got really confused when trying to use them. I think I should use unsupervised algorithms. Is that correct?

rna-seq • 771 views
ADD COMMENTlink modified 3.0 years ago by genomax89k • written 3.0 years ago by zhen.qin130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1709 users visited in the last hour