Question

Reverse Clustering?

1

Entering edit mode

11.6 years ago

Eric Fournier ★ 1.4k

My apologies if this question's title is vague: I do now know how to label what I am trying to do, which has made my efforts at finding relevant litterature frustrating and unsuccessful.

I am analyzing the results of a two-color microarray hybridization experiment using the limma package. To validate the normalization methods I've been applying to the data, I am generating hierarchical clusters of individual channels of the microarrays to see if control and treatment samples cluster together. They do not; rather the green and red channels for each array cluster together, indicating that the array effect is more important than any other. My attempts at changing the normalization algorithms have proven unfruitful in correcting this problem.

My hypothesis for the moment is that the microarray I am using (which is of custom design) contains two classes of probes: one that is informative vis-à-vis the biological factor of interest, and another which contains nothing but noise due to a probe-design defect. What I'm looking for is some kind of algorithm/program which would take my microarray data and an a priori expected tree of samples and partitions the probes into those would support such a tree and those who would not. Is there such a thing, or at least something similar to it which II could use as a starting point for more research?

microarray clustering • 3.6k views

ADD COMMENT • link updated 8.4 years ago by Biostar 20 • written 11.6 years ago by Eric Fournier ★ 1.4k

0

Entering edit mode

Did you run a a dye-swap experiment to see if you can account for any dye bias?

ADD REPLY • link 11.6 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

Yes, we are doing dye-swaps. All replicates are biological, and we are alternating the dyes we use for control and treatment.

ADD REPLY • link 11.6 years ago by Eric Fournier ★ 1.4k

score 3 · Answer 1 · 2012-09-21

3

Entering edit mode

11.6 years ago

Michael 54k

With two color microarrays, channels are normally not analysed separately but instead as log ratio. That, because in spotted microarrays (that's a (out)dated technique anyways) the raw intensities capture mainly array effects, and therefore single-channel analysis is a no-go. You didn't tell us the technology platform but I guess it might be 'home-brew' arrays or agilent 2-color? In fact, your results are not surprising.

Instead, I would calculate normalized, background corrected log-channel ratios and use these values, making ratios should (in theory eliminate most of the array effects, as you see in your results by having channels clustered together). That can be done with limma as well, a similar question here: http://www.biostars.org/post/show/9372/limma-analysis-for-two-channeled-microarray-data-fetched-using-geoquery/

Here is also a step-by-step walkthrough: http://matticklab.com/index.php?title=Two_channel_analysis_of_Agilent_microarray_data_with_Limma

ADD COMMENT • link 11.6 years ago by Michael 54k

0

Entering edit mode

Thank you for your answer, Michael.

We are using Agilent two-color microarrays. I am aware that such microarrays should be studied using log-ratios, and this is how we intend to study the biological effects of the treatment.

However, since the array is custom-designed, we have the possibility of replacing uninformative probes with better-performing ones in subsequent experiments that we will carry out. I am also under the impression that certain probes are more sensitive to the array-effect than others. It is in this perpective that I am trying to "cluster" probes into two categories: those who yield biologically meaningful data, and those where the array-effect is predominant, and who should be replaced in subsequent array designs.

However, when working on log-ratios, I am "cancelling out" the array effect, and thus cannot draw conclusions on its relative importance for each probe. This is why I was working on single channels, hoping that by first clustering the samples, then finding probes whose variation did not "fit" (For examples, probes whose response is always in the maximum range due to repeated elements), I could target such probes for replacement. I assumed this would work as I've generally had success clustering the various conditions of single-channel data through by-group analysis before, but not this time around.

ADD REPLY • link 11.6 years ago by Eric Fournier ★ 1.4k