Question: Differentially expressed genes machine learning classifer
1
gravatar for devknight009
3 months ago by
devknight00930
India
devknight00930 wrote:

I am new to R and machine learning. I want to create a machine learning classifier which can classify between Normal and diseased sample using Differentially expressed genes obtained from GEO microarray datasets, as input features. I have obtained my DEGs using limma package. Now how to use DEGs to train the machine learning classifier ? plz help

machine learning degs R • 367 views
ADD COMMENTlink modified 3 months ago by Kevin Blighe67k • written 3 months ago by devknight00930

I am new to ... machine learning

Why do you want to use ML here? What arre some existing methods, and what are some flaws in them that you're trying to solve using ML?

ADD REPLYlink written 3 months ago by RamRS30k

Using machine learning i want to show that these DEGs can act as biomarkers by differentiating normal sample from a diseased sample

ADD REPLYlink written 3 months ago by devknight00930
1

A lot of people have done this kind of thing. I think classifiers for benign vs malignant thyroid tumors based of rna-seq was one i remember from a few years back. This is a googleable thing

ADD REPLYlink written 3 months ago by curious460
1

Coincidentally, I have published in this area via a TCGA re-analysis, but not 'Machine Learning': Comprehensive transcriptomic analysis of papillary thyroid cancer: potential biomarkers associated with tumor progression

ADD REPLYlink written 3 months ago by Kevin Blighe67k

Yeah I tried to find the exact paper but couldn't, I think there is a groups of companies that do this though. Something like the sample needed for histology is invasive to get, so they just get a little bit of RNA and try to classify benign that way. Machine learning stuff is so in vogue, I think people want to use it to check a buzzword box, but I remember this application I thought was kind of neat and made sense.

ADD REPLYlink written 3 months ago by curious460

What is lacking in GSEA etc. that ML can solve? What is your definition of "normal" and "diseased"? What are your DE groups?

ADD REPLYlink written 3 months ago by RamRS30k

I have to design a project related to ML. I have taken GEO microarray dataset , it has microarray data for control sample and parkinson's sample obtained from blood. Have found DEGs using limma, now want to use these DEGs for ML classification of Control and Parkinson's sample.

ADD REPLYlink written 3 months ago by devknight00930

I have to design a project related to ML

That seems to be a sub-optimal way of approaching a problem. "Is ML useful here" should be the question. Anyway, like curious says, I'm sure there are a lot of people that have run classifiers on public datasets. Are you doing a toy project or a real one?

ADD REPLYlink written 3 months ago by RamRS30k

Its a real one, ML classification is first step in it

ADD REPLYlink written 3 months ago by devknight00930
4

I think it is clear what you want to do. The thing is that biostars is intended to answer specific technical questions rather than guiding you along a topic that you apparently have no background in. I suggest you dive into the available online resources, textbooks and courses at your institution and get a solid foundation first. You will rarely find users online that will provide a end-to-end workflow for you, especially given that you want to develop something on your own.

ADD REPLYlink written 3 months ago by ATpoint41k

I don't want an end to end workflow. I'll be happy if someone can suggest any particular R package to look at or any particular blog

ADD REPLYlink written 3 months ago by devknight00930
6
gravatar for i.sudbery
3 months ago by
i.sudbery9.7k
Sheffield, UK
i.sudbery9.7k wrote:

You can probably do this using a something simple like a logistic regression classifier. Try searching for "logistic regression in R". Remeber that doing good ML is about more than just picking the correct algorithm. You have to carefully design training, validation and test sets, or use k-fold validation, and think carefully about what metrics you use to assess the performance of your model, particularly if you have unbalanced classes. Finally, you will ideally want a test test that comes from a different experiment this will ensure that your model is generalization. This means you'll need to think carefully about how to normalize the input data so that it is comparable across studies.

Personally, I found Andrew Ng's Coursea course on machine learning very useful to get to grips with the basic concepts in machine learning. It focuses on the surrounding concepts as much the algorithms/models themselves, which I found to be very helpful.

ADD COMMENTlink modified 3 months ago • written 3 months ago by i.sudbery9.7k
2

If only Andrew Ng used Python or R instead of Octave! I cannot wrap my head around that bizarre language.

ADD REPLYlink modified 3 months ago • written 3 months ago by RamRS30k
4
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Some thoughts:

A: What is the best way to combine machine learning algorithms for feature selectio

Kevin

ADD COMMENTlink modified 3 months ago • written 3 months ago by Kevin Blighe67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1327 users visited in the last hour