I am analysing an Affy Mogene 2.0 ST array, and I would like to collapse all cross-hybridizing probesets into a single transcript cluster.
A first approach I thought was to annotate all probesets with the corresponding gene_id, using the probesets -> gene mapping that can be extracted from Biomart or the Affymetrix website. Then, for the probesets that are repeated, take only one probeset, maybe the one with the maximum expression value.
However, I read about some CDF files which apparently have already done that, and they can be downloaded at: http://nmg-r.bioinformatics.nl/NuGO_R.html
The problem is that I cannot figure out how to read such files and use them to collapse my probeset-level expression matrix into a transcript-level expression matrix.