Question: Need help understanding variability of probe set intensities for microarray data.
2.5 years ago by
United States
blakeoft10 wrote:

I have been looking at the relationship between log transformed estimated RNA-seq counts and microarray (HG-U133a) gene expression. Luckily, TCGA has both kinds of data for several patients. I began by comparing the preprocessed data sets from TCGA which are already on the gene level. While pleased with the results for the most part, I decided to download the micro array CEL files and RMA process them so that I could look at the probe set level. Interestingly, some of the probe sets have very different distributions, but are mapped to the same gene.

I am curious why this happens. My first thought was that this has to do with the suffixes of the probe set IDs. I've found information about what the suffixes mean from Affymetrix's webpage. I'm a little confused by what they mean. To be more specific:

"_at = all the probes hit one known transcript.
_a = all probes in the set hit alternate transcripts from the same gene
_s = all probes in the set hit transcripts from different genes


For HG-U133, the _a designation was not used; an _s probe set on these arrays means the same as an _a on any of the HG-U133 arrays. "

This quote is from, and I'm assuming that the mention of HG-U133 includes HG-U133a. The last sentence is the confusing part. Is it saying that an _s probe on HG-U133 array means the same as an _a probe for the arrays that actually have _a probes?

I suppose my main question is, if I see very different distributions of two probe sets that map to the same gene, what could that mean? If they are measuring the expression of different isoforms/transcripts of the same gene, how can I find out which ones each probe set is measuring?

Thanks for any insight.


probeset affymetrix microarray • 1.3k views
modified 2.5 years ago by h.mon9.6k • written 2.5 years ago by blakeoft10
2.5 years ago by
h.mon9.6k wrote:

First question: I would think that, indeed, differences in probes for the same gene may be due to measuring different isoforms. Did you check the variability for all probe sets? Does this affects just a few of them?

Second question: you could download the probes, map against the human genome and check the results on IGV. Another option is described on Hg-U133 Plus 2.0 Probe Set answer.

edit: some further suggestions Is It Possible For Two Different Affymetrix Probe Set Id To Have Common Annotations To Same Gene ?.

modified 2.5 years ago • written 2.5 years ago by h.mon9.6k

Thank you for the links. They are quite helpful. I'm looking at EGFR, and it seems to make sense after looking at the USCS custom tracks and my distributions at the same time. I think I'll be able to look at exon info from the TCGA data as well to see if the RNA seq data is picking up the same isoform as the microarrays. However, I'm still not certain about what the "_s" suffix means in either the case of HG-U133a or otherwise.

written 2.5 years ago by blakeoft10

I also found the verbal explanation confusing, but this page has a figure worth a thousand words.

written 2.5 years ago by h.mon9.6k

I've seen this image before, but the link that I've posted made me think that HG-U133a might be an exception to this rule. Just to be clear, does the "_s" mean that it's probing a sequence that can be found in multiple genes? I can't actually find an example of a probe mapping to a gene other than the one listed on the UCSC browser. For example, this probe only mentions chr7:55086725-55275772.

modified 2.5 years ago • written 2.5 years ago by blakeoft10
