Question: RNA probe ids from affymetrix
0
gravatar for Stemcellmonkey
4.4 years ago by
European Union
Stemcellmonkey10 wrote:

Hi Biostars

I am new to expression data, I found my expression data of the type: 

[MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [CDF: mogene10st_Mm_ENTREZG_17.1.0]

And i compared to groups of mice in the geo2r http://www.ncbi.nlm.nih.gov/geo/geo2r/

And got a list with a lot of probe id's which i downloaded to a textfile which looks like this:

"ID"    "adj.P.Val"    "P.Value"    "t"    "B"    "logFC"    "SPOT_ID"
"73649_at"    "0.0125"    "6.56e-07"    "14.107579"    "5.19301"    "2.8542797"    "73649"
"17921_at"    "0.0125"    "1.33e-06"    "-12.865506"    "4.80638"    "-1.1733647"    "17921"
"18174_at"    "0.0125"    "1.76e-06"    "12.395099"    "4.64119"    "2.5836002"    "18174"

Do you know some kind of software which can convert the probeids to gene names for this kind of file? 

ADD COMMENTlink modified 4.4 years ago by Ann2.2k • written 4.4 years ago by Stemcellmonkey10

Search this site for "biomart".

EDIT: which may not help in this case - see answer below.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Neilfws48k
4
gravatar for Neilfws
4.4 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It appears that whoever created these probeset IDs has made a poor decision.

If we examine the GEO page for GSE56257, we see that the microarray platform is GPL17777. The description reads: "This is identical to GPL6246 but a custom cdf environment was used to extract data. The cdf can be found at the link below."

If you scroll down to the data table for GPL17777, you will see entries like this:

ID              SPOT_ID      Description
100008567_at    100008567    predicted gene 14964
100009600_at    100009600    zinc finger, GATA-like protein 1
100009609_at    100009609    vomeronasal 2, receptor 65

The IDs in the SPOT_ID column have the same numerical prefix as those in the ID column. However, the numbers are Entrez Gene IDs - the SPOT_ID is hyperlinked to the Entrez Gene entry. So the IDs in the first column are not Affymetrix probeset IDs and cannot be used to search the platform for gene annotation directly - they are IDs that the authors have invented.

What you could try is downloading one of the supplementary files from the GPL17777 page. The file GPL17777_mogene10st_Mm_ENTREZG_mapping.txt.gz seems to contain mapping between the original Affymetrix IDs and the author-created IDs.

The other solution, of course, is to use the Entrez Gene IDs from the SPOT_ID column in your BioMart query.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Neilfws48k
1
gravatar for zorbax
4.4 years ago by
zorbax30
Mexico
zorbax30 wrote:
cat file_table | awk '{print $1}' | perl -pe 's/\"//g; s/\_at//; s/^\s+//g' | tail -n +2 > ids

 

Then upload the file here and download the info.

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by zorbax30

Never tried this way, are you sure that you get ID conversions without specifying version of probesets from this site?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Manvendra Singh2.0k
1
gravatar for Denise - Open Targets
4.4 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets4.8k wrote:

Yes, we do have this Affymetrix set in Ensembl BioMart, however the listed probe IDs do not return any results (if you are new to BioMart check this video out).

If we search for those IDs on the Ensembl browser, we will see that one maps to human and chimp (73649_at) whereas the other two (17921_a and 18174_at) maps nowhere.

 

ADD COMMENTlink written 4.4 years ago by Denise - Open Targets4.8k

Thank you :-)

I got the data from:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE56257

ADD REPLYlink written 4.4 years ago by Stemcellmonkey10

See my answer for the reason for this confusion.

ADD REPLYlink written 4.4 years ago by Neilfws48k
0
gravatar for Manvendra Singh
4.4 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

Easy way out is following

First download genes from Ensembl Biomart

directing filters as

with Affymetrix Microarray mogene 1 0 st v1 probeset ID(s): Only

Attributes as

Associated gene names and Affy MoGene probeset

then in R

read the biomart data, make probe IDs as rownames (lets say dat1)

read your affy data, make probe IDs as rownames (lets sat dat2)

dat = merge(dat1,dat2, by="row.names", all=FALSE)

dat is your with annotated probe ID and their associated gene names.

 

HTH

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Manvendra Singh2.0k
0
gravatar for Ann
4.4 years ago by
Ann2.2k
Concord NC USA
Ann2.2k wrote:

Can you get the CEL files?  If yes, you can re-process the data yourself, which is pretty easy to do these days using BioConductor tools. Sometimes it can lead to new discoveries because new and better methods may have been published.

GEO2R is great - I love that it shows you the R code syntax! But it's a bit limited in that it only works with the expression matrix uploaded by the submitter. So far as I know, it will not re-run array processing (RMA or MAS5), which is what I would recommend doing.

ADD COMMENTlink written 4.4 years ago by Ann2.2k

Thank you Ann :-) Yes. Do you know a good basic tutorial for processing CEL files? I'm a medicine student and need to start at a very basic level ;-).

ADD REPLYlink written 4.4 years ago by Stemcellmonkey10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1282 users visited in the last hour