Question: Affymetrix Genechip Probeset Id Question In General And Using R
2
gravatar for Sandra
7.6 years ago by
Sandra20
Sandra20 wrote:

Hello everybody!

I just started working with data from the Affymetrix GeneChip Mouse Gene 1.0 ST Array and I have two questions about that and hope that somebody can help me.

1) The first is a more general and probably very easy question but I was wondering which IDs I should use to map Affymetrix probesetids or transcriptclusterids to genes. I found here a lot of questions and very good answers about that several probesetids/transcriptclusterids are matched with the same gene etc. I don't have a problem using different R packages (biomaRt, xmapcore, mogene10sttranscriptcluster.db etc.) to match the IDs from Affymetrix to Gene Symbol, Ensembl, Entrez Gene, Unigene IDs etc. But my question is which of these IDs I should use to determine that two (or more) transcriptclusterids are matched to the same gene? In other words, what ID "type" is the standard to say that two probesets are assigned to the same gene?

I guess for most of the genes it shouldn't make a difference which ID type I use but for some probesets the annotation is different and the probesets with missing annotations are different for the different types. I saw that others used the Gene symbols (what I would have used) but I also saw the use of Unigene IDs...

2) I am using R and used ReadAffy to get an AffyBatch object from the cel files that I got from GEO. On GEO the platform is described as [MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [transcript (gene) version].

From Affymetrix I downloaded two annotation files: MoGene-10-st-v1.na30.mm9.transcript.csv and MoGene-10-st-v1.na30.mm9.probeset.csv

I now wanted to match the data in my AffyBatch object to the probesetids in the probeset annotation file. When I try probeName(AffyBatch), I get the transcriptclusterids for each row in the intensity matrix (these are equal to the probesetids in the transcript annotation file) but not the probesetids from the probeset annotation file. Is the information about the probesetids from the probeset annotation file not stored in my AffyBatch object because the cel files are from a "transcript (gene) version" or what do I do wrong?

Thank you very much for your help!

Sandra

ADD COMMENTlink modified 4.6 years ago by 64bga0 • written 7.6 years ago by Sandra20
1
gravatar for Pascal
7.6 years ago by
Pascal10
Pascal10 wrote:

Hey Sandra,

the Gene 1.0 ST Array are very similar to the Exon 1.0 ST Array. On both arrays a probe set is more are less an exons. Meaning if you analyze your arrays on probe set level you'll get signals for each exon. A transcript cluster contains all probe sets of a gene and therefore can be used to measure gene expressions.

1) The probes set annotation contains only the annotation of the exon it belongs to. This means, if a probe set matches to an exons, which is not e.g. in the RefSeq database, then the probe set does not contain a RefSeq annotations, but may have an UniGene ID. A transcript cluster annotations contains all annotations of the gene it belongs to.

2) MoGene-1_0-st-v1.na30.mm9.transcript.csv only contains transcript cluster IDs and should be used for gene level analysis (as you obviously did). There is a second column labeled with "probeset_id" in the transcript.csv, but that is actually again the transcript cluster ID. MoGene-1_0-st-v1.na30.mm9.probeset.csv contains annotations for each probe set (exon). Since you analyse your arrays on gene level you cannot map these to your result. This file contains transcript cluster, so you can see which probe set belongs to which cluster.

Pascal

ADD COMMENTlink written 7.6 years ago by Pascal10

Hi Pascal, thanks for your answer. I am still confused with the second point. Is the level on which I do the analysis already given by the set of cel files that I download from GEO ("transcript (gene) version" of the platform in my case)? Because the only thing I did in R is using the ReadAffy() function and I don't know how I chose the level of analysis in this. Or what do I have to do in R to do an analysis of this particular data on the exon (probe set) level (if it is possible at all)? Thanks, Sandra

ADD REPLYlink written 7.6 years ago by Sandra20

The CEL files you downloaded are not specific for gene level analysis. They always contain all probes. I do not know how to do this in R, so far I justed used the Afftmetrix Power Tools for such analysis.

ADD REPLYlink written 7.6 years ago by Pascal10
0
gravatar for 64bga
4.6 years ago by
64bga0
Spain
64bga0 wrote:

Hi,

I'm new in microarrays analysis and I have similar troubles with HuGene-1_0-st-v1 .CEL files and I wanna use it to do some Gene analysis and Pathways analysis. My file has the next probeset_id, and similar transcript_cluster to:

7896737
7896739
7896741
7896743
7896745
7896747
7896749
7896751

and I always see something like that:

1007_s_at
1053_at
117_at

is it because the HuGenes have this probeset_id? or exist some way to get the second probeset_id with suffixes? because I have a big problem understanding the id's: probeset, transcript with genes or exons.

Thanks in advance!!

ADD COMMENTlink written 4.6 years ago by 64bga0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1900 users visited in the last hour