Affymetrix Genechip Probeset Id Question In General And Using R
1
2
Entering edit mode
12.7 years ago
Sandra ▴ 20

Hello everybody!

I just started working with data from the Affymetrix GeneChip Mouse Gene 1.0 ST Array and I have two questions about that and hope that somebody can help me.

1.The first is a more general and probably very easy question but I was wondering which IDs I should use to map Affymetrix probeset_ids or transcript_cluster_ids to genes. I found here a lot of questions and very good answers about that several probeset_ids/transcript_cluster_ids are matched with the same gene etc. I don't have a problem using different R packages (biomaRt, xmapcore, mogene10sttranscriptcluster.db etc.) to match the IDs from Affymetrix to Gene Symbol, Ensembl, Entrez Gene, Unigene IDs etc. But my question is which of these IDs I should use to determine that two (or more) transcript_cluster_ids are matched to the same gene? In other words, what ID "type" is the standard to say that two probesets are assigned to the same gene?

I guess for most of the genes it shouldn't make a difference which ID type I use but for some probesets the annotation is different and the probesets with missing annotations are different for the different types. I saw that others used the Gene symbols (what I would have used) but I also saw the use of Unigene IDs...

  1. I am using R and used ReadAffy to get an AffyBatch object from the cel files that I got from GEO. On GEO the platform is described as [MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [transcript (gene) version].

    From Affymetrix I downloaded two annotation files: MoGene-1_0-st-v1.na30.mm9.transcript.csv and MoGene-1_0-st-v1.na30.mm9.probeset.csv

    I now wanted to match the data in my AffyBatch object to the probeset_ids in the probeset annotation file. When I try probeName(AffyBatch), I get the transcript_cluster_ids for each row in the intensity matrix (these are equal to the probeset_ids in the transcript annotation file) but not the probeset_ids from the probeset annotation file.

    Is the information about the probeset_ids from the probeset annotation file not stored in my AffyBatch object because the cel files are from a "transcript (gene) version" or what do I do wrong?

Thank you very much for your help!

Sandra

annotation microarray probeset affymetrix • 9.9k views
ADD COMMENT
0
Entering edit mode

Hi,

I'm new in microarrays analysis and I have similar troubles with HuGene-1_0-st-v1 .CEL files and I wanna use it to do some Gene analysis and Pathways analysis. My file has the next probeset_id, and similar transcript_cluster to:

7896737
7896739
7896741
7896743
7896745
7896747
7896749
7896751

and I always see something like that:

1007_s_at
1053_at
117_at

is it because the HuGenes have this probeset_id? or exist some way to get the second probeset_id with suffixes? because I have a big problem understanding the id's: probeset, transcript with genes or exons.

Thanks in advance!!

ADD REPLY
3
Entering edit mode
12.7 years ago
Pascal ▴ 20

Hey Sandra,

the Gene 1.0 ST Array are very similar to the Exon 1.0 ST Array. On both arrays a probe set is more are less an exons. Meaning if you analyze your arrays on probe set level you'll get signals for each exon. A transcript cluster contains all probe sets of a gene and therefore can be used to measure gene expressions.

1) The probes set annotation contains only the annotation of the exon it belongs to. This means, if a probe set matches to an exons, which is not e.g. in the RefSeq database, then the probe set does not contain a RefSeq annotations, but may have an UniGene ID. A transcript cluster annotations contains all annotations of the gene it belongs to.

2) MoGene-1_0-st-v1.na30.mm9.transcript.csv only contains transcript cluster IDs and should be used for gene level analysis (as you obviously did). There is a second column labeled with "probeset_id" in the transcript.csv, but that is actually again the transcript cluster ID. MoGene-1_0-st-v1.na30.mm9.probeset.csv contains annotations for each probe set (exon). Since you analyse your arrays on gene level you cannot map these to your result. This file contains transcript cluster, so you can see which probe set belongs to which cluster.

Pascal

ADD COMMENT
0
Entering edit mode

Hi Pascal, thanks for your answer. I am still confused with the second point. Is the level on which I do the analysis already given by the set of cel files that I download from GEO ("transcript (gene) version" of the platform in my case)? Because the only thing I did in R is using the ReadAffy() function and I don't know how I chose the level of analysis in this. Or what do I have to do in R to do an analysis of this particular data on the exon (probe set) level (if it is possible at all)? Thanks, Sandra

ADD REPLY
0
Entering edit mode

The CEL files you downloaded are not specific for gene level analysis. They always contain all probes. I do not know how to do this in R, so far I justed used the Afftmetrix Power Tools for such analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6