Question: Train classifier based on unsupervised clustering of genes
0
gravatar for teabonng
11 days ago by
teabonng10
teabonng10 wrote:

Hi,

I will be training a classifier based on the results of unsupervised clustering of genes. The overall goal is to determine which chromatin or epigenetic features in these genes are the best features indicating that they are regulated by a central protein regulator.

First, I will cluster genes based on their expression profiles under different treatment conditions from multiple RNA-Seq experiments. We are working on the assumption that genes that are clustered together are co-regulated by this central protein. Then from the results of the clustering, we identify which specific cluster is enriched in genes that are already known to be regulated by this central protein. Genes in these cluster (in addition to the validated ones) will be used as the positive examples for training a classifier. The input features include methylation and other chromatin features. Then from the best performing model, we get the features that are most important or have the highest coefficients. We can then validate that these features are important by performing experiments in the lab.

I just want some insights in the machine learning point of view Thank you very much.

ADD COMMENTlink modified 10 days ago • written 11 days ago by teabonng10
1

Hello teabonng!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/4479/train-classifier-based-on-unsupervised-clustering-of-genes

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 11 days ago by Devon Ryan81k
1
gravatar for Jean-Karim Heriche
11 days ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche15k wrote:

If you have validated genes, you should use them for training the classifier, the clustering will most likely introduce noise. If you don't have a good training set and you don't need to make predictions, just stick to the clustering if it groups the genes in a sensible way.
Also how do you define an important feature ?
It seems that the question is to find what chromatin features are associated with groups of co-regulated genes. In this case, a good starting point would be to look at relative enrichment of chromatin features in each cluster.

ADD COMMENTlink written 11 days ago by Jean-Karim Heriche15k

Hi Jean-Karim,

Thank you very much for the insights. The important features are what we aim to determine. So , when we get a good performing machine learning model, maybe we can get the top features based on the coefficients.

ADD REPLYlink written 11 days ago by teabonng10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1565 users visited in the last hour