several gene names for a probeID in affymetrix annotation file
1
0
Entering edit mode
9.1 years ago

Dear all,

I am trying to map geneIDs from annotation file of S.aureus to probeIDs.

The problem is for over 2000 of rows there are more than 2 geneIDs for corresponding probeID in a row.

Here is the row number of 3544 of annotation file that I put as example:

sa_i10207dr_x_at    1120534 // gi|1120534|ref|NC_002758.2|NC_002758.2(GI:57634611):629461-632324(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrC  LOCUS=SAV0561 // ncbi_bacterial // 13 // --- /// 1120535 // gi|1120535|ref|NC_002758.2|NC_002758.2(GI:57634611):632689-636848(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrD  LOCUS=SAV0562 // ncbi_bacterial // 116 // --- /// 1120536 // gi|1120536|ref|NC_002758.2|NC_002758.2(GI:57634611):637240-640667(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrE  LOCUS=SAV0563 // ncbi_bacterial // 27 // --- /// 1122655 // gi|1122655|ref|NC_002758.2|NC_002758.2(GI:57634611):2782009-2784642(-) Staphylococcus aureus subsp. aureus Mu50, GENE=clfB PRODUCT=Clumping factor B LOCUS=SAV2630 // ncbi_bacterial // 14 // --- /// 1123324 // gi|1123324|ref|NC_002745.2|NC_002745.2(GI:29165615):605214-608077(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrC  LOCUS=SA0519 // ncbi_bacterial // 13 // --- /// 1123325 // gi|1123325|ref|NC_002745.2|NC_002745.2(GI:29165615):608442-612601(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrD  LOCUS=SA0520 // ncbi_bacterial // 116 // --- /// 1123326 // gi|1123326|ref|NC_002745.2|NC_002745.2(GI:29165615):612993-616420(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrE  LOCUS=SA0521 // ncbi_bacterial // 28 // --- /// 1125352 // gi|1125352|ref|NC_002745.2|NC_002745.2(GI:29165615):2718295-2720928(-) Staphylococcus aureus subsp. aureus N315, GENE=clfB PRODUCT=Clumping factor B LOCUS=SA2423 // ncbi_bacterial // 14 // --- /// 3236072 // gi|3236072|ref|NC_002951.2|NC_002951.2(GI:57650036):635788-639935(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrD PRODUCT=sdrD protein LOCUS=SACOL0609 // ncbi_bacterial // 164 // --- /// 3236073 // gi|3236073|ref|NC_002951.2|NC_002951.2(GI:57650036):640327-643829(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrE PRODUCT=sdrE protein LOCUS=SACOL0610 // ncbi_bacterial // 34 // --- /// 3236353 // gi|3236353|ref|NC_002951.2|NC_002951.2(GI:57650036):632578-635423(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrC PRODUCT=sdrC protein LOCUS=SACOL0608 // ncbi_bacterial // 13 // --- /// 3237041 // gi|3237041|ref|NC_002951.2|NC_002951.2(GI:57650036):2711036-2713777(-) Staphylococcus aureus subsp. aureus COL, GENE=clfB PRODUCT=clumping factor B LOCUS=SACOL2652 // ncbi_bacterial // 15 // ---

I want to know is it correct if I consider only the first geneID in each row?

I will appreciate any advice

Nazanin

redundancy gene-names • 1.9k views
ADD COMMENT
0
Entering edit mode

@nazaninhoseinkhan did not you ask this question before in another form? corresponding gene names for probeIDs

ADD REPLY
0
Entering edit mode

No, in that question I wanted to know how to summarizes probeIDs(merge the same probeIDs), while in this one I want to assign gene names to each probeIDs.

ADD REPLY
0
Entering edit mode

Different things indeed. But the custom annotation from brainarray that I mentioned in my answer should solve both.

ADD REPLY
0
Entering edit mode

I checked brainarray but it seems it does not support bacteria

ADD REPLY
1
Entering edit mode
9.1 years ago

Yes, that is a common problem with Affymetrix probesets. They can often hit multiple genes that either share sequences covered by specific genes or probesets just turned out not to be as consistent as intended. Affymetrix is aware of that problem and documents it. You can even see a problem directly from the probeset set name (probeset names ending in _x_at are supposed to have the problem you describe). A description for a mouse array here explains that, that explanation is not mouse specific though.

Third party solutions to that problem exist. Typically probes are realigned with latest annotated genomes to get updated probesets which no longer contain fixed numbers of probes. Such custom probe annotations are for instance available from brainarray.

If you use our microarray quality control, normalisation and analysis pipelines at arrayanalysis.org you can choose to use these custom annotations (custom cdf's).

ADD COMMENT

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6