Question: Random forest for gene expression
0
gravatar for krushnach80
9 months ago by
krushnach80420
krushnach80420 wrote:

Im quoting the lines from this book chapter

"Schematic illustration for gene expression of microarray data. Figure modified from [47].
From the computational perspective, the microarray data is described as an N × M matrix. Each
row describes a sample and each column represents a gene except the last column which means the
class label of each sample. gi, j
is a numeric value representing the gene expression level of gene j
in the i-th sample. ci
is the class label of the i-th sample "

It says the last column is the label what exactly is the label I understand samples in rows and genes in columns , and the last column is the class label what kind of label is it?

The chapter

rna-seq R • 662 views
ADD COMMENTlink modified 9 months ago by noorpratap.singh110 • written 9 months ago by krushnach80420
3
gravatar for noorpratap.singh
9 months ago by
India
noorpratap.singh110 wrote:

Whole point of random forest or any other classifier is to be able to predict some class labels(o/and get some useful features associated with the phenotype). So if you want to some sort of classification trying to know what exactly the class a particular sample belongs to for that random forest is used. The last column is asking for labels in order to train the classifier and learn specific gene expression rules associated with a class.

Lets for example, you want to predict whether a sample belongs to normal or tumor. So first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information. Then for the sample so far not used for training, you want to determine which class(normal or tumor) it is and learnt random forest enables you to that.

Hope it helps

ADD COMMENTlink modified 9 months ago • written 9 months ago by noorpratap.singh110

can you give me an example ? because say for im doing for normal vs disease haematopoiesis so i have expression values for both normal haematopoiesis with their intermediate lineages and matures one as well as disease haematopoiesis expression so how do i achieve this " first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information." , as of now i have only expression values..so how I do i make a classifier first ,I hope you can get my question..

ADD REPLYlink written 9 months ago by krushnach80420
1

I get their questions so columns are genes(lets say total N) rows are samples(lets say p) Matrix p*N such that p[i][j] denotes the expression of sample i for gene j. Now add another column such for every normal sample you add the label N and for every tumor sample T, this additional column denotes the class of sample. Then put this matrix in the classifier for training. However to see whether your classifier works you could perform a n-fold cross validation.

Below link might help. Machine Learning For Cancer Classification - Part 2 - Building A Random Forest Classifier

ADD REPLYlink written 9 months ago by noorpratap.singh110

now i get an idea to start i will do it and get back to you

ADD REPLYlink written 9 months ago by krushnach80420

so i have to make give a subset of the data to train isn;t it? and how do i choose from my data matrix which will be comprised of both normal and tumour ?

ADD REPLYlink written 9 months ago by krushnach80420

Well for starters you can do a 10-fold cross validation. There exists functions in any language for doing it.

ADD REPLYlink written 9 months ago by noorpratap.singh110

okay...I will search for it

ADD REPLYlink written 9 months ago by krushnach80420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1689 users visited in the last hour