Question: Problem in hierarchical clustering
0
gravatar for Calangoa
6 months ago by
Calangoa30
Calangoa30 wrote:

Hi there, I have a problem with my hierarchical clustering method and I appreciate if anyone could help me in advance. Let me start from the first step, in order to identify differentially expressed genes in some microarray studies (each study consist of 3 individual dataset, collectively I have 15 dataset) I use limma package from bioconductor, R. Then I filtered out those genes with adj. P-value less than 0.05. After that, I extracted a set of genes which involved in the cell cycle for example. Finally, this set of genes with there expression base on log fold change were used for hierarchical clustering. As I read before for log-transformed data Euclidean distance measurement method with complete linkage is the best for my data but the problem is when I clustered 15 dataset, surprisingly data from the same study stand close together in one cluster. What can I do for this mistaken view? Would it possible to use only one control for all treatment data from a different study in R? Or another approach would be taken?

Many thanks in advance

clustring microarray • 307 views
ADD COMMENTlink modified 6 months ago by leaodel110 • written 6 months ago by Calangoa30
1

Can you show the design matrix, and especially if and how you checked and/or compensated for potential batch effects?

ADD REPLYlink written 6 months ago by ATpoint28k

Here is the photo of heirarchical clustring

I think my mistake is I dont consider the batch effect. I normiliza each study separetly then I clustered them together. How can I compensate batch effect? In what way? Would it a good idea to normiliza all datasets together? But I dont know how could it possible. Any suggestion?

ADD REPLYlink modified 6 months ago by RamRS25k • written 6 months ago by Calangoa30

Please edit this post and see the changes I've made to see how to add images properly.

Images should be added using the image button, not the link button. You'll need the direct link to the image, not the link to the page hosting the image.

ADD REPLYlink written 6 months ago by RamRS25k

If you normalize separately then this result is totally normal and expected as the datasets of the single studies are only scaled within the study but not to each other. If you do z-scoring then you at least have to normalize them all together, not discussing if comparing values from different studies makes sense due to the batch effect.

ADD REPLYlink written 6 months ago by ATpoint28k

I know, but I want to normalize them to compensate batch effect and to find which data is close to CM without considering what dataset is belong to which study. Any way?!

ADD REPLYlink written 6 months ago by Calangoa30
2
gravatar for leaodel
6 months ago by
leaodel110
EUA
leaodel110 wrote:

If you have a known batch effect and plan to visualize your data you'll need to correct the log-transformed data for this batch effect. I use limma::removeBatchEffect.

ADD COMMENTlink written 6 months ago by leaodel110

No I dont know, I just want to. After clustring I found that different daraset from one study stand close together in one cluster but it is not correct when they compared with CM data. How can I do metaanalysis and normilize microarray data from different study?

ADD REPLYlink written 6 months ago by Calangoa30
1

So a batch effect is not something that you'll correct by means of normalization. You have to use a method designed to measure the variance attributed to the batch variable and then correct for it. If you have a hidden batch effect you can use sva.

The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv).

Once the batch effect is removed, you can proceed to the hierarchical clustering.

ADD REPLYlink written 6 months ago by leaodel110

Calangoa, if the answer was helpful to solve your problem, please accept it as an answer.

ADD REPLYlink written 6 months ago by leaodel110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour