Question: Supervised feature selection/clustering in R
gravatar for bojingjia
3.8 years ago by
United States
bojingjia10 wrote:

I have a couple sets of single-level RNA-seq experiments in which I'm identifying clusters of upregulated/downregulated genes across one treatment (ie. WT against KO). There are multiple biological replicates within each group - the problem is that there is in-group variance and the PCA shows that the two groups are not exactly linearly separable. In other words, the differences I'm looking for are probably subtle.

EDIT: I have been using DESeq2 to do differential gene expression analysis. My question is as follows:

Is there a way to cluster genes knowing that samples within a group should bear resemblance to one another? Ie. Find genes that are upregulated across all of the WT relative to all of the KOs? When I cluster them right now, the sample distances are sometimes so variable that the WT and KOs cluster together. 

Forgive me if this is a really trivial question! Novice here.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by bojingjia10
gravatar for Sean Davis
3.8 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

It sounds like you want to perform differential expression analysis.  DESeq2 if a fine package to do such analysis.

ADD COMMENTlink written 3.8 years ago by Sean Davis25k

To aswer your question more specifically, you should try doing the PCA and clustering on normalized values. The rlog() transformation in the DESeq2 package is perfect for that. After that, if the replicates still don't cluster together, you could suspect batch effect or something like that.

ADD REPLYlink written 3.8 years ago by Carlo Yague4.8k

Sorry if I didn't make myself clear. I have been using DESeq2 to do differential gene expression, and the replicates don't cluster together. I'm wondering if there is a strategy to do supervised feature selection in DESeq2.

ADD REPLYlink written 3.8 years ago by bojingjia10

DESeq2 was developed for performing Differential Expression (what you are calling supervised feature selection) on RNA-seq data.  The vignette is fantastic, so you should definitely take the time to go through it.

ADD REPLYlink written 3.8 years ago by Sean Davis25k

Hi Sean,

Thanks for your response. I've been using DESeq2 for sometime now, and I'm not satisfied with the built-in clustering algorithms. To be more specific, DESeq2 clusters using pheatmaps, which looks at the samples (correct me if I'm wrong) blindly, and clusters them by sample distance. In theory, samples within a single condition should cluster together, but in my dataset they do not. I would like to develop a clustering method that can find genes if and only if they are upregulated across ALL samples within a group and not upregulated across ALL samples within the control group, even if this is a modest, subtle difference.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by bojingjia10

I suspect you realize this, but there may not be a set of genes that satisfy the condition you set out.  That said, what genes (features) are you using to do the clustering; ie., how did you choose the genes for the clustering? You'll want to use the top genes after differential expression analysis to get as close to your ideal as possible, knowing that you may not be able to achieve the ideal of all samples clustering together if the data do not support such a clustering.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Sean Davis25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1886 users visited in the last hour