Question

ReadAffy function of Affy R package

0

Entering edit mode

8.5 years ago

zaynabmousavian ▴ 10

Hi,

I want to read a microarray dataset which has been produced by the HGU133Plus2 platform. I know that the number of probes in this platform is about 50000 probes, but when I have used the ReadAffy function of the Affy package to read .Cel files of the corresponding dataset with the input parameter of cdfname="HGU133PLUS2_HS_ENTREZG", the number of probes in the affybatch object is about 19000 probes which are unique. Do anyone know what strategy has been used by the command for selecting one probe from multiple probes which are assigned to one gene?

Thanks

R • 3.5k views

ADD COMMENT • link updated 8.5 years ago by svlachavas ▴ 790 • written 8.5 years ago by zaynabmousavian ▴ 10

Ram · Accepted Answer · 2015-11-11

Dear Zaynabmousanian,

I naively guess that you used custom CDF arrays from Brain Array, as you have a significantly fewer number of probesets after the import of raw data. A quick answer, is that the group who produced them has remapped all probes to new probe sets based on Entrez Genes. In detail:

Procedure for generating custom CDF files

A. After probe sequences are BLASTed against the latest UniGene Build and genome sequence, a series of filtering and grouping criteria are applied for different CDF files:

2.2. CDF files for Reference sequence, Entrez Gene and Exon, ENSEMBL Gene, Transcript and Exon and VEGA Gene, Transcript and Exon>
A probe must hit only one genomic location.
Probes that can be mapped to the same target sequence in the correct direction are grouped together in the same probe set.
Each probe set must contain at least three oligonucleotide probes and probes in a set are ordered according to their location in the corresponding exon.

You should also check the link http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp from the Group and the according papers.

Hope that helps,

Efstathios