Hi,
I am working with MoGene-1_0-st-v1 gene microarray in R. This is how my Affydata looks like
AffyBatch object
size of arrays=1050x1050 features (21 kb)
cdf=MoGene-1_0-st-v1 **(34760 affyids)**
number of samples=9
**number of genes=34760**
annotation=mogene10stv1
Now when I do this:
dim(pm(rawData))
the result is
[1] 819041 9
What I cannot understand is the relation between number of affyids and number of PM intensity reads? i.e. 34760:819041
Thanks
I agree, multiple probe sets represent each gene, but what really threw me off, is that the ratio did not come out to be a whole number. That would mean that the probe set count for each affy id is different. Also when I do RMA on these probe sets, they are summarized into a representative value for each gene. Does this mean summarize has an internal logic of calculating a mean for each gene? or does it pick the intensities based on some other logic?
Majority of the probesets have same number of probes but not all of them. To find out exactly, you can use following commands in R after loading your .CEL file as an affybatch object under 'affy' package --
ids<-rownames(probes(YourAffyBatch)); ids=strsplit(ids,"_"); ids=paste(sapply(ids,function(x)x[1]),"at",sep="_") ; table(table(ids));
This will give you an output something like the following -- (first row has number of probes in probeset, lower row has the number of probesets carrying that many probes. Ex. There are 19 probesets that contain 8 probes)
There are several algorithms to summarize probesets. You can find short descriptions for them in the affy vignette too.