Microarray Data: Intensity Cutoff
4
3
Entering edit mode
12.2 years ago
Ofelia ▴ 30

Hi All,

I have a question related to the analysis of the microarray data.

In the process of the selection of the DEG (differentially expressed genes) one would compare (for example) the 2 groups of subjects, will do the fold-change and p-value cut-off.

Additionally, some recommend looking at the LS Means data - mean intensity of each probe across all subjects in a group 1 and group 2 separately.

Then, they recommend filtering out the probes with low mean intensity.

Can anyone recommend me how to derive the cut-off? Is it different for group 1 versus group 2? What is the reason to exclude low intensity probes - is it because they are too noisy?

Thanks a lot!

microarray filter differential • 10k views
ADD COMMENT
5
Entering edit mode
12.2 years ago

The lower the reported expression of a probe on a microarray, the lower the signal-to-noise ratio. As you approach the amount of florescence generated by shining the laser at glass you get a less reliable report.

Various microarray platforms have different built-in means to estimate the background signal level and call "present or absent" for each measurement. I typically use a heuristic method: I look at the expression of Y-chromosome genes on female samples and use that to set a threshold. Some mammalian chips have probes for yeast genes on the same principle; these should be negatives.

I would normally identify probesets called not present, filter out probesets that are too-frequently called absent (e.g. missing in some large percentage of all samples) and then do the differential expression analysis.

ADD COMMENT
1
Entering edit mode

There's no standard number because it will depend on the platform and the samples. However, you can infer males and females from the expression of genes on the Y chromosome (SRY, for example). Look at more than one gene to get started and hope that you don't have all male samples.

ADD REPLY
0
Entering edit mode

Thank you very much David for your answer.

For these samples (BTW, it is human lung biopsies) I do not have any demographic information at all (I know how bad it is).

Is there a ballpark number? Say 5?

Any publications you can refer me to would be very much appreciated :)

ADD REPLY
3
Entering edit mode
12.2 years ago

The reason to consider filtering probes is largely to reduce the number of tests for which you need to correct for multiple testing. The goal is to remove probes that CANNOT show differential expression because they carry such little information. A common method is to calculate some measure of the variance for each probe across the samples and use the top X% of those probes with the highest variance. The probes with lower variance carry less information about the differences between samples, so the thought is to ignore them. Note that we are not talking about intensities. Even probes with lower intensities can show differential expression. On the other.

For a more complete explanation of some of the issues, see this manuscript:

http://www.pnas.org/content/107/21/9546.long

ADD COMMENT
2
Entering edit mode
12.2 years ago

To some degree this might depend on what microarray platform and specific design you are using.

A number of arrays have negative control probesets specifically included to assess the signal expected in the absence of hybridization to a real target. This 'random' signal can be correlated with GC content or melting Tm of the probes. In some cases the negative controls will represent the range of GC/Tm of the rest of the probes on the array. It is thus possible to fit a linear (or non-linear) model to these negative control probes and use this fit to estimate a suitable cutoff for all of the probes on your array while taking GC bias into account.

This strategy is described with diagrams here: Pre-processing of data and in this manuscript: ALEXA: a microarray design platform for alternative expression analysis.

The negative control probes can fall into a variety of categories, such as: complete random sequences, sequences from another species, sequences within the introns of expressed genes, etc.

ADD COMMENT
0
Entering edit mode

Hi Malachig,

Is it also relevant for PM only arrays?

ADD REPLY
0
Entering edit mode

No. I would say the PM/MM situation is yet another way of thinking about determining cutoffs but the concepts are the same.

ADD REPLY
1
Entering edit mode
12.2 years ago
seidel 11k

One thing that's common to do, to see how the ratio is related to the overall intensity is to create an MA plot - where you plot the log2 ratio of expression on the y-axis (M), and the intensity (of probes from both samples) on the x-axis (A), where A = log2(sqrt(group1*group2)). With various kinds of arrays and background treatment this plot can vary quite a bit, but is pretty effective at showing you if there's a relationship between ratio and intensity.

Something else to keep in mind, is that if you're comparing two groups, and looking for ratios between them, a down-regulated gene may go to background in one channel (or group) but not the other, so you may want to evaluate using AND vs. OR in terms of a background requirement.

ADD COMMENT
0
Entering edit mode

Hi Seidel,

Thanks very much for your answer!

I did have an MA plot as one of the QC measures, so I will inspect it more carefully.

From your experience what was the range of the cut-off values (just to see if I am completely off with the signal of the chips.

I also have to mention that I have 2 batches run - is the cut-off supposed to be batch specific? I did normalize them together...

Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6