Question

Is There Any Way To Use The Log Normalized Ratios To Find Absolute Signal Intensities Of Every Gene?

0

Entering edit mode

10.2 years ago

abhishek.subramanian89 • 0

I'm in need for a set of highly expressed constitutive set of genes for an organism of interest. I found microarray data for my organism where they have done whole transcriptome profiling for finding differential expression between two developmental stages of the same organism.

I have downloaded the microarray dataset of my organism from NCBI GEO. I have a file that contains raw data and another file containing log normalized (processed) data of every gene in the organism. The authors in their papers have indicated that genes with log2 normalized ratios between 0.6 and 1.7 are supposed to be the constitutively expressed gene set. The genes lying in the given range can be extracted from the processed data file of log normalized values in these ranges.

But If I have to find highly expressed genes from this whole set of constitutively expressed genes,I require individual intensity of the genes from the raw data so that I can compare and find out highly expressing constitutive genes.

Is there any way to use the log normalized ratios to find absolute signal intensities of that gene? If so, how can it be done? Also, if I can find absolute intensities from the raw data, which column in the raw data file should be chosen?

microarray normalization • 5.1k views

ADD COMMENT • link updated 9.6 years ago by Biostar 20 • written 10.2 years ago by abhishek.subramanian89 • 0

score 1 · Answer 1 · 2014-03-13

If all you have are ratios, the answer is NO, you can not extract intensity values from ratios. However, you mentioned that you have the raw data that was used to construct those ratios. So you simply have to match your ratios back to the raw data in a gene-wise fashion. The raw data consists of microarray intensity values that generally reflect expression levels (brighter signal in test relative to control indicates higher expression in test). Asking about specific columns in the absence of specific information about platform, etc. is impossible to answer. You have to explore and learn your data. However, one of the columns will reflect the values for the numerator of your ratios, while the other column will reflect the denominator.

edit: given your description of the platform in the comments below, they printed their own arrays, and used a genepix scanner (Molecular Devices) and software to quantify the results. The description of all the column headings can be found in the GenePix manual, and is a good read on how to quantify signal from a grid of pixels. Nonetheless, the values you are probably most interested in F647 Median, B647 Median, and F555 Median, B555 Median. Each of these represents the median Foreground and Background measurements for each channel (i.e. the red and green light, or the test and control samples) for each microarray spot. Depending on resolution, each spot might consist of 80 pixels, so the median is used to summarize the spot intensity (but you can see from the columns that you could choose the mean; there are many options there, you could choose to take the median of pixel/pixel ratios etc.). What you do next can be simple or complicated, the data is NOT normalized (which you can confirm by plotting it), you can choose to subtract background or not (opinions differ on whether this is important). So you should normalize, and then take ratios - but there are several methods of normalization. I think your best bet is to read the limma userguide for R (http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf) where you'll find a good discussion of the issues, and if you know any R you can explore the data rather quickly.