Question: Get GEO microarray probe sequences by GPL ID
0
gravatar for predeus
7 months ago by
predeus1.4k
Russia
predeus1.4k wrote:

Hello all,

I was wondering if there is a straightforward way to obtain probe sequences for a microarray platform given the GPL ID. I know that there are varying degrees of annotation for microarrays in GEO - most popular ones have "annot" files, while others have "miniml" and "soft" files. However these are all over the place - different gene symbols, IDs, etc.

So, if you can suggest how can I get a simple table "probe id - sequence" using the GEO GPL ID, I would be most grateful.

ADD COMMENTlink modified 7 months ago • written 7 months ago by predeus1.4k
1

vkkodali : If you happen to look at this thread I would be curious to know if there is a way to use Entrezdirect. I tried to hack at it some but can't seem to make any headway.

predeus : We may have the best chance of getting an answer from the user I quoted above, so apologies for what may seem like an off-target comment.

ADD REPLYlink modified 7 months ago • written 7 months ago by GenoMax95k

I think that you can just search for the GPL ID at GEO and then there should be an entire annotation table to download, no?

ADD REPLYlink written 7 months ago by Kevin Blighe69k

There are sometimes "annot" tables, and always the "soft" files. Both contain an annotation table, which varies widely between platforms - very few have actual probe sequences.

ADD REPLYlink written 7 months ago by predeus1.4k

Oh, you need probe sequences. I am pretty sure that they are available via biomaRt, the CDF Bioconductor packages, and/or from the manufacturer. The manufacturer definitely has probe sequence files, e.g., Affymetrix U133: http://www.affymetrix.com/support/technical/byproduct.affx?product=hgu133

ADD REPLYlink written 7 months ago by Kevin Blighe69k
1

Thank you.

I've looked at biomaRt (and used it quite a few times) before - they seem to have only the most popular microarray platforms (about 30 different ones for human). If you look at GPLs in GEO, there is over 1000 for each human and mouse. Plus there doesn't seem to be an easy way to match GPL to biomaRt or individual manufacturer's annotation packages since they all seem to use slightly different exact names (I might be wrong, I'm still trying to figure it out).

I was amazed to see that most GPLs that GEO contains don't have sequences at all (which is the only thing you need to annotate them properly). Oh well. It wouldn't be the first thing that's messed up in bioinformatics :)

ADD REPLYlink written 7 months ago by predeus1.4k

Yes, only the most common ones (it seems) are included at Ensembl and accessible via biomaRt. The manufacturers' web-sites should have the data though, no?

ADD REPLYlink written 7 months ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1439 users visited in the last hour
_