I am new to expression data, I found my expression data of the type:
[MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array [CDF: mogene10st_Mm_ENTREZG_17.1.0]
And i compared to groups of mice in the geo2r http://www.ncbi.nlm.nih.gov/geo/geo2r/
And got a list with a lot of probe id's which i downloaded to a textfile which looks like this:
"ID" "adj.P.Val" "P.Value" "t" "B" "logFC" "SPOT_ID"
"73649_at" "0.0125" "6.56e-07" "14.107579" "5.19301" "2.8542797" "73649"
"17921_at" "0.0125" "1.33e-06" "-12.865506" "4.80638" "-1.1733647" "17921"
"18174_at" "0.0125" "1.76e-06" "12.395099" "4.64119" "2.5836002" "18174"
Do you know some kind of software which can convert the probeids to gene names for this kind of file?
It appears that whoever created these probeset IDs has made a poor decision.
If we examine the GEO page for GSE56257, we see that the microarray platform is GPL17777. The description reads: "This is identical to GPL6246 but a custom cdf environment was used to extract data. The cdf can be found at the link below."
If you scroll down to the data table for GPL17777, you will see entries like this:
ID SPOT_ID Description 100008567_at 100008567 predicted gene 14964 100009600_at 100009600 zinc finger, GATA-like protein 1 100009609_at 100009609 vomeronasal 2, receptor 65
The IDs in the SPOT_ID column have the same numerical prefix as those in the ID column. However, the numbers are Entrez Gene IDs - the SPOT_ID is hyperlinked to the Entrez Gene entry. So the IDs in the first column are not Affymetrix probeset IDs and cannot be used to search the platform for gene annotation directly - they are IDs that the authors have invented.
What you could try is downloading one of the supplementary files from the GPL17777 page. The file GPL17777_mogene10st_Mm_ENTREZG_mapping.txt.gz seems to contain mapping between the original Affymetrix IDs and the author-created IDs.
The other solution, of course, is to use the Entrez Gene IDs from the SPOT_ID column in your BioMart query.
Yes, we do have this Affymetrix set in Ensembl BioMart, however the listed probe IDs do not return any results (if you are new to BioMart check this video out).
Easy way out is following
First download genes from Ensembl Biomart
directing filters as
with Affymetrix Microarray mogene 1 0 st v1 probeset ID(s): Only
Associated gene names and Affy MoGene probeset
then in R
read the biomart data, make probe IDs as rownames (lets say dat1)
read your affy data, make probe IDs as rownames (lets sat dat2)
dat = merge(dat1,dat2, by="row.names", all=FALSE)
dat is your with annotated probe ID and their associated gene names.
Can you get the CEL files? If yes, you can re-process the data yourself, which is pretty easy to do these days using BioConductor tools. Sometimes it can lead to new discoveries because new and better methods may have been published.
GEO2R is great - I love that it shows you the R code syntax! But it's a bit limited in that it only works with the expression matrix uploaded by the submitter. So far as I know, it will not re-run array processing (RMA or MAS5), which is what I would recommend doing.