RNA probe ids from affymetrix
5
0
Entering edit mode
7.1 years ago

Hi Biostars

I am new to expression data, I found my expression data of the type: 

[MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [CDF: mogene10st_Mm_ENTREZG_17.1.0]

And i compared to groups of mice in the geo2r http://www.ncbi.nlm.nih.gov/geo/geo2r/

And got a list with a lot of probe id's which i downloaded to a textfile which looks like this:

"ID"    "adj.P.Val"    "P.Value"    "t"    "B"    "logFC"    "SPOT_ID"
"73649_at"    "0.0125"    "6.56e-07"    "14.107579"    "5.19301"    "2.8542797"    "73649"
"17921_at"    "0.0125"    "1.33e-06"    "-12.865506"    "4.80638"    "-1.1733647"    "17921"
"18174_at"    "0.0125"    "1.76e-06"    "12.395099"    "4.64119"    "2.5836002"    "18174"

Do you know some kind of software which can convert the probeids to gene names for this kind of file? 

expression analysis rna RNA-Seq • 4.3k views
ADD COMMENT
0
Entering edit mode

Search this site for "biomart".

EDIT: which may not help in this case - see answer below.

ADD REPLY
4
Entering edit mode
7.1 years ago
Neilfws 49k

It appears that whoever created these probeset IDs has made a poor decision.

If we examine the GEO page for GSE56257, we see that the microarray platform is GPL17777. The description reads: "This is identical to GPL6246 but a custom cdf environment was used to extract data. The cdf can be found at the link below."

If you scroll down to the data table for GPL17777, you will see entries like this:

ID              SPOT_ID      Description
100008567_at    100008567    predicted gene 14964
100009600_at    100009600    zinc finger, GATA-like protein 1
100009609_at    100009609    vomeronasal 2, receptor 65

The IDs in the SPOT_ID column have the same numerical prefix as those in the ID column. However, the numbers are Entrez Gene IDs - the SPOT_ID is hyperlinked to the Entrez Gene entry. So the IDs in the first column are not Affymetrix probeset IDs and cannot be used to search the platform for gene annotation directly - they are IDs that the authors have invented.

What you could try is downloading one of the supplementary files from the GPL17777 page. The file GPL17777_mogene10st_Mm_ENTREZG_mapping.txt.gz seems to contain mapping between the original Affymetrix IDs and the author-created IDs.

The other solution, of course, is to use the Entrez Gene IDs from the SPOT_ID column in your BioMart query.

ADD COMMENT
1
Entering edit mode
7.1 years ago
zorbax ▴ 260
cat file_table | awk '{print $1}' | perl -pe 's/\"//g; s/\_at//; s/^\s+//g' | tail -n +2 > ids

 

Then upload the file here and download the info.

 

ADD COMMENT
0
Entering edit mode

Never tried this way, are you sure that you get ID conversions without specifying version of probesets from this site?

ADD REPLY
1
Entering edit mode
7.1 years ago
Denise CS ★ 5.2k

Yes, we do have this Affymetrix set in Ensembl BioMart, however the listed probe IDs do not return any results (if you are new to BioMart check this video out).

If we search for those IDs on the Ensembl browser, we will see that one maps to human and chimp (73649_at) whereas the other two (17921_a and 18174_at) maps nowhere.

 

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

See my answer for the reason for this confusion.

ADD REPLY
0
Entering edit mode
7.1 years ago
Manvendra Singh ★ 2.2k

Easy way out is following

First download genes from Ensembl Biomart

directing filters as

with Affymetrix Microarray mogene 1 0 st v1 probeset ID(s): Only

Attributes as

Associated gene names and Affy MoGene probeset

then in R

read the biomart data, make probe IDs as rownames (lets say dat1)

read your affy data, make probe IDs as rownames (lets sat dat2)

dat = merge(dat1,dat2, by="row.names", all=FALSE)

dat is your with annotated probe ID and their associated gene names.

 

HTH

 

ADD COMMENT
0
Entering edit mode
7.1 years ago
Ann ★ 2.3k

Can you get the CEL files?  If yes, you can re-process the data yourself, which is pretty easy to do these days using BioConductor tools. Sometimes it can lead to new discoveries because new and better methods may have been published.

GEO2R is great - I love that it shows you the R code syntax! But it's a bit limited in that it only works with the expression matrix uploaded by the submitter. So far as I know, it will not re-run array processing (RMA or MAS5), which is what I would recommend doing.

ADD COMMENT
0
Entering edit mode

Thank you Ann :-) Yes. Do you know a good basic tutorial for processing CEL files? I'm a medicine student and need to start at a very basic level ;-).

ADD REPLY

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6