Question

Supervised feature selection/clustering in R

0

Entering edit mode

8.3 years ago

bojingjia ▴ 10

I have a couple sets of single-level RNA-seq experiments in which I'm identifying clusters of upregulated/downregulated genes across one treatment (ie. WT against KO). There are multiple biological replicates within each group - the problem is that there is in-group variance and the PCA shows that the two groups are not exactly linearly separable. In other words, the differences I'm looking for are probably subtle.

EDIT: I have been using DESeq2 to do differential gene expression analysis. My question is as follows:

Is there a way to cluster genes knowing that samples within a group should bear resemblance to one another? Ie. Find genes that are upregulated across all of the WT relative to all of the KOs? When I cluster them right now, the sample distances are sometimes so variable that the WT and KOs cluster together.

Forgive me if this is a really trivial question! Novice here.

pheatmaps deseq2 RNA-Seq clustering • 4.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by bojingjia ▴ 10

Ram · Answer 1 · 2016-01-16

0

Entering edit mode

8.3 years ago

Sean Davis 26k

It sounds like you want to perform differential expression analysis. DESeq2 if a fine package to do such analysis.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Sean Davis 26k

0

Entering edit mode

To aswer your question more specifically, you should try doing the PCA and clustering on normalized values. The rlog() transformation in the DESeq2 package is perfect for that. After that, if the replicates still don't cluster together, you could suspect batch effect or something like that.

ADD REPLY • link 8.3 years ago by Carlo Yague 8.6k

0

Entering edit mode

Sorry if I didn't make myself clear. I have been using DESeq2 to do differential gene expression, and the replicates don't cluster together. I'm wondering if there is a strategy to do supervised feature selection in DESeq2.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by bojingjia ▴ 10

0

Entering edit mode

DESeq2 was developed for performing Differential Expression (what you are calling supervised feature selection) on RNA-seq data. The vignette is fantastic, so you should definitely take the time to go through it.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Sean Davis 26k

0

Entering edit mode

Hi Sean,

Thanks for your response. I've been using DESeq2 for sometime now, and I'm not satisfied with the built-in clustering algorithms. To be more specific, DESeq2 clusters using pheatmaps, which looks at the samples (correct me if I'm wrong) blindly, and clusters them by sample distance. In theory, samples within a single condition should cluster together, but in my dataset they do not. I would like to develop a clustering method that can find genes if and only if they are upregulated across ALL samples within a group and not upregulated across ALL samples within the control group, even if this is a modest, subtle difference.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by bojingjia ▴ 10

1

Entering edit mode

I suspect you realize this, but there may not be a set of genes that satisfy the condition you set out. That said, what genes (features) are you using to do the clustering; ie., how did you choose the genes for the clustering? You'll want to use the top genes after differential expression analysis to get as close to your ideal as possible, knowing that you may not be able to achieve the ideal of all samples clustering together if the data do not support such a clustering.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Sean Davis 26k