Question: annotating Agilent Drosophila microarray probes by BLAST
3.9 years ago
United Kingdom
yotiao0 wrote:



I am analysing old Agilent Drosophila microarray data and trying to update the annotations of the probes, as there were probably two new Drosophila genome assemblies since the data was acquired. Typically, I would do it via BiomaRt, but there is no longer such option (i.e. I cannot convert Agilent probe ids to anything). And I'd rather not use the annotation provided by Agilent on the microarray, as this may be way too old and inaccurate.

So the best option (?) is using probe sequences themselves and getting gene ids based on them. I figured out that I could run BLAST with probe sequence against Drosophila genome/transcriptome and get gene/transcript IDs from BLAST hits (I have >25k probe sequences to run). My question: is there an easier way to do that (that's a lot of blasting and parsing...)? I have been trying to find if someone somewhere maintains the conversion between probes and other identifiers, but haven't found anything (not UCSC, not FlyBase). Annotate package is not helpful either.





3.9 years ago
Spain. Universidad de Córdoba
Antonio R. Franco4.0k wrote:

A question and an answer..

Agilent text data can be read by R packages such as limma, and then you can get the ProbeName names used in the array (which is not useful for you) and the SystematicName of each of the probes, both being accessible by calling the arrayname$gene part of the data (which also contains the row, column, Type of control, etc)

The SystematicName use to have accession names that can be useful for you.. Have you give it a look?

If not SystematicName is present with useful information, then you can use Blast2Go. There are two versions. The PRO version can be used for a week or so for free, and will allow you to search in local databases

Blast2Go will provided you with the best Blast hit (any kind of Blast), and also with interPro domains, EC enzyme, KEGG and extensive GO data

written 3.9 years ago by Antonio R. Franco4.0k

Yes. I have the SystematicName table (and it contains transcript id (in Ensembl/FlyBase format) for each probe) but my worry is that it's outdated (these arrays were designed probably 10 years ago). So I am trying to make a new annotation, and the only other piece of information I have is ProbeName and Sequence. Currently none of the packages/websited I tried supports conversion from Agilent ProbeName to Ensembl/FlyBase gene id. So I am trying it the hard way, aligning probe sequence and extracting gene/transcript name from there. I just hope there is an easier way.

written 3.8 years ago by yotiao0

