Question

Affymetrix Genechip Probeset Id Question In General And Using R

2

Entering edit mode

14.1 years ago

Sandra ▴ 20

Hello everybody!

I just started working with data from the Affymetrix GeneChip Mouse Gene 1.0 ST Array and I have two questions about that and hope that somebody can help me.

1.The first is a more general and probably very easy question but I was wondering which IDs I should use to map Affymetrix probeset_ids or transcript_cluster_ids to genes. I found here a lot of questions and very good answers about that several probeset_ids/transcript_cluster_ids are matched with the same gene etc. I don't have a problem using different R packages (biomaRt, xmapcore, mogene10sttranscriptcluster.db etc.) to match the IDs from Affymetrix to Gene Symbol, Ensembl, Entrez Gene, Unigene IDs etc. But my question is which of these IDs I should use to determine that two (or more) transcript_cluster_ids are matched to the same gene? In other words, what ID "type" is the standard to say that two probesets are assigned to the same gene?

I guess for most of the genes it shouldn't make a difference which ID type I use but for some probesets the annotation is different and the probesets with missing annotations are different for the different types. I saw that others used the Gene symbols (what I would have used) but I also saw the use of Unigene IDs...

I am using R and used ReadAffy to get an AffyBatch object from the cel files that I got from GEO. On GEO the platform is described as [MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [transcript (gene) version].

From Affymetrix I downloaded two annotation files: MoGene-1_0-st-v1.na30.mm9.transcript.csv and MoGene-1_0-st-v1.na30.mm9.probeset.csv

I now wanted to match the data in my AffyBatch object to the probeset_ids in the probeset annotation file. When I try probeName(AffyBatch), I get the transcript_cluster_ids for each row in the intensity matrix (these are equal to the probeset_ids in the transcript annotation file) but not the probeset_ids from the probeset annotation file.

Is the information about the probeset_ids from the probeset annotation file not stored in my AffyBatch object because the cel files are from a "transcript (gene) version" or what do I do wrong?

Thank you very much for your help!

Sandra

annotation microarray probeset affymetrix • 11k views

ADD COMMENT • link updated 3.9 years ago by Ram 45k • written 14.1 years ago by Sandra ▴ 20

0

Entering edit mode

Hi,

I'm new in microarrays analysis and I have similar troubles with HuGene-1_0-st-v1 .CEL files and I wanna use it to do some Gene analysis and Pathways analysis. My file has the next probeset_id, and similar transcript_cluster to:

and I always see something like that:

1007_s_at
1053_at
117_at

is it because the HuGenes have this probeset_id? or exist some way to get the second probeset_id with suffixes? because I have a big problem understanding the id's: probeset, transcript with genes or exons.

Thanks in advance!!

ADD REPLY • link updated 3.9 years ago by Ram 45k • written 11.1 years ago by 64bga • 0

score 3 · Answer 1 · 2011-09-16

Hey Sandra,

the Gene 1.0 ST Array are very similar to the Exon 1.0 ST Array. On both arrays a probe set is more are less an exons. Meaning if you analyze your arrays on probe set level you'll get signals for each exon. A transcript cluster contains all probe sets of a gene and therefore can be used to measure gene expressions.

1) The probes set annotation contains only the annotation of the exon it belongs to. This means, if a probe set matches to an exons, which is not e.g. in the RefSeq database, then the probe set does not contain a RefSeq annotations, but may have an UniGene ID. A transcript cluster annotations contains all annotations of the gene it belongs to.

2) MoGene-1_0-st-v1.na30.mm9.transcript.csv only contains transcript cluster IDs and should be used for gene level analysis (as you obviously did). There is a second column labeled with "probeset_id" in the transcript.csv, but that is actually again the transcript cluster ID. MoGene-1_0-st-v1.na30.mm9.probeset.csv contains annotations for each probe set (exon). Since you analyse your arrays on gene level you cannot map these to your result. This file contains transcript cluster, so you can see which probe set belongs to which cluster.

Pascal