Question

What is the right way to convert probe names from Brainarray Custom CDF to Ensembl ?

0

Entering edit mode

3.9 years ago

Aspire ▴ 300

I am working on a custom CDF: Brainarray HGU133Plus2_Hs_ENTREZG_v17.

I want to convert the probe names to Ensembl names. I have a few specific questions; if instead anyone prefers to explain the right way of doing that, I'd also be glad.

1) Just to get it staight : the ENTREZG does not mean that the CDF itself is in anyway specific to ENTREZ, is that correct? I prefer working in Ensembl, so can I safely use HGU133Plus2_Hs_ENSG_v17 from Brainarray's website?

When I go to Brainarray's website and download the "CDF Seq Map Desc" file (last column), I see lines like that :

Probe Set Name  Chr     Chr Strand      Chr From        Probe X Probe Y Affy Probe Set Name
ENSG00000000003_at      X       -       99884769        1019    717     209109_s_at
ENSG00000000003_at      X       -       99884536        1054    679     209108_at

2) Does "Affy Probe Set Name" (the last column) stand for the probe set names of the Brainarray custom cdf?

3) What does the "probe set name" (first column) mean? Are they simply Ensembl names (those that I need?).

custom-cdf brainarray affymetrix microarray • 1.5k views

ADD COMMENT • link 3.9 years ago by Aspire ▴ 300

0

Entering edit mode

I'm unsure of the specifics of the BrainArrays, but they are essentially the same as the Affy 'chips' on which they are based. So, the probe-set names are Affy probe IDs. In your case, the underlying chip was Affy U133 Plus 2. So, you can easily obtain extra annotation like this:

require(hgu133plus2.db)

probes <- c('209109_s_at', '209108_at')

keytypes(hgu133plus2.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GO"           "GOALL"        "IPI"          "MAP"          "OMIM"        
[16] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"        
[21] "PROBEID"      "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[26] "UNIGENE"      "UNIPROT"     


mapIds(hgu133plus2.db, keys = probes,
  column = c('ENSEMBL'), keytype = 'PROBEID')
      209109_s_at         209108_at 
"ENSG00000000003" "ENSG00000000003" 


select(hgu133plus2.db, keys = probes,
  columns = c('PROBEID', 'SYMBOL', 'GENENAME', 'ENSEMBL', 'GO'))
       PROBEID SYMBOL      GENENAME         ENSEMBL         GO EVIDENCE
1  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0005515      IPI
2  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0005887      IBA
3  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0039532      IMP
4  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0043123      HMP
5  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0070062      HDA
6  209109_s_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:1901223      IDA
7    209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0005515      IPI
8    209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0005887      IBA
9    209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0039532      IMP
10   209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0043123      HMP
11   209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:0070062      HDA
12   209108_at TSPAN6 tetraspanin 6 ENSG00000000003 GO:1901223      IDA
   ONTOLOGY
1        MF
2        CC
3        BP
4        BP
5        CC
6        BP
7        MF
8        CC
9        BP
10       BP
11       CC
12       BP

Kevin

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

They are essentially the same as the Affy 'chips' on which they are based.

But part of the idea of custom CDF is to remap the probes to a more current genome annotation (as well as to deal with the problem of many probes that represent the same gene).

Here

Due to the significant increase in EST/cDNA/Genomic sequence information in the last couple of years, some oligonucleotide probes in these old designs can now be assigned to different genes/transcripts based on the current UniGene clustering and genome annotation

If the annotation is identical to the original Affymetrix probe, that seems to miss part of the reason for the creation of custom CDF in the first place...

P.S. The links on the Brainarray website itself for querying probe set identities are broken.

ADD REPLY • link 3.9 years ago by Aspire ▴ 300

0

Entering edit mode

I suppose that it depends on whether you are content with the original annotation or not, or if you are specifically choosing BrainArrays for some reason. If you want the 'new' annotation, then just use the CDF Seq Map Desc table for mapping.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

score 0 · Accepted Answer · 2020-06-11

To answer my own technical question question about the Brainarray custom CDF file format :

Probe Set Name  Chr     Chr Strand      Chr From        Probe X Probe Y Affy Probe Set Name
ENSG00000000003_at      X       -       99884769        1019    717     209109_s_at
ENSG00000000003_at      X       -       99884536        1054    679     209108_at

This file is the _mapping.txt file, downloaded from CDF Seq Map Desc column of the Brainarray website. The format appears to be the following :

The first column is Brainarray's probe name. Thanks to Brainarray, it is simply composed of the database entry name + "_at". So the probe name ENSG00000000003_at stands for ENSG00000000003. If the custom CDF would be HGU133Plus2_Hs_REFSEQ (instead of HGU133Plus2_Hs_ENSG which appears above), Brainarray's probe set name would be for example NM_000122.1_at.

The last column is the Affymetrix probe set name that is converted to the Brainarray probe set name.