I'm trying to re-annotate probe ids from experiments carried out using a customised gene chip. The probe data is currently labelled with a variety of identifiers, mostly from a standardised probe-set like Agilent or Affymetrix. However, a small portion have an ambiguous description which does not conform to any known probe ID labelling schema and approximately one fifth of the total probe set have been listed using modified UniGene identifiers, and many of these ids have since been retired. The only constant known is that all sequence data originates from a patient data and is human. Gene selection for chip customisation was facilitated by literature review. I am attempting to track down the dataset used to annotate these probes, but in the absence of such information could do with a a few suggestions.
We're currently looking at batch BLAST'ing the sequence fragments against a current nucleotide dataset (preferably RefSeq), but need an api/rest driven service for doing so with complete autonomy. I have identified the WABI resource as one method for doing so [http://xml.nig.ac.jp/rest/Invoke?service=Blast&method=searchParam&program=<PROGRAMME>&database=<DB_ID>&query=<SEQUENCE_STRING>¶m=-b+<NUM_RETURNED_HITS>+-m+<TABLE_FORMAT>] but would like to be able to submit searches against RefSeq to bring annotations in line with other data being produced here. WABI does not provide a direct entry point for RefSeq.
So, has anyone got any experience updating identifiers from Expression data? In the case where a retired dataset identifier is provided is re-annotation the best solution or would following the succession of IDs through their respective bioinformatic dataset be preferential?
All opinions, tips, suggestions, critiques welcomed.