Hello,
I am analysing old Agilent Drosophila microarray data and trying to update the annotations of the probes, as there were probably two new Drosophila genome assemblies since the data was acquired. Typically, I would do it via BiomaRt, but there is no longer such option (i.e. I cannot convert Agilent probe ids to anything). And I'd rather not use the annotation provided by Agilent on the microarray, as this may be way too old and inaccurate.
So the best option (?) is using probe sequences themselves and getting gene ids based on them. I figured out that I could run BLAST with probe sequence against Drosophila genome/transcriptome and get gene/transcript IDs from BLAST hits (I have >25k probe sequences to run). My question: is there an easier way to do that (that's a lot of blasting and parsing...)? I have been trying to find if someone somewhere maintains the conversion between probes and other identifiers, but haven't found anything (not UCSC, not FlyBase). Annotate package is not helpful either.
Thanks!
Yes. I have the SystematicName table (and it contains transcript id (in Ensembl/FlyBase format) for each probe) but my worry is that it's outdated (these arrays were designed probably 10 years ago). So I am trying to make a new annotation, and the only other piece of information I have is ProbeName and Sequence. Currently none of the packages/websites I tried supports conversion from Agilent ProbeName to Ensembl/FlyBase gene id. So I am trying it the hard way, aligning probe sequence and extracting gene/transcript name from there. I just hope there is an easier way.