Hello,
I'm trying to analyze a custom agilent oligonucleotide array. I have acces to three files (via GEO) the original .gpr file, an annotation file which links Probe IDs to GB_RANGE
GenBank accession range - specifies a particular sequence position within a GenBank accession number. Use format ACCESSION.VERSION[start..end]. Useful for tiling arrays.
And an already processed file which links Probe IDs to values.
My task is to visualize the chip data in the UCSC Browser (value vs GB_RANGE), but basically I can't work with it in any way at the moment. The array is a custom type, and the provided annotation file comes in the following format:
ProbeSet ID Name CONTROL TYPE SEQUENCE GBRANGE SPOTID 5 Hs05041539310905-60 GTTCCCACCCCCAACCCGAACTCACAGCCGGTCTCCTTCTTGATCTCCTCGAGCTCTTCG NC000015.8[39310905..39310845]
So sadly, there is only the GB_RANGE to identify the probe (or a complete remapping of all probes with help of the sequence). To visualize the data, I would need it either in .wig format, or better in .bed format. Something like: chromosome start end value
I could copy and paste the GB_RANGE together with the value from the .gpr in one excelfile, but this file would still not be readeable by the ucsc browser. The problem is, that I can't extract the chromosome, start and end values from the annotation file. Is there perhaps a standard method to deal with custom arrays?
If there is not, how can I extract the needed Information out of the GBRANGE format NC000015.8[39310905..39310845]???
Thanks,
David
Since you have a custom array it is unlikely that there is a standard method to transform the data.