Entering edit mode
6 months ago
jain72744
▴
10
I have extracted multiple datasets from GEO database. However, I am unable to discern the appropriate IDs for the inputs. I have the sequences but on preforming nBLAST, I got multiple hits with 100% similarity. Kindly guide me for the same.
GPLs:
GPL21827 Agilent-079487 Arraystar Human LncRNA microarray V4 (Probe Name version)
GPL20822 CapitalBio custom Human LncRNA array
I am not sure exactly what you are asking but the annotations for these GPL's is available as a table at NCBI.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL20822
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL21827
If you
blat
the sequences against the correct genome build (e.g. hg19) you will get the sequence locations.I have done this already. Further, I wish to perform enrichment analysis using gprofiler but IDs like XLOC_000236 are not recognised. On performing blast, I get more than 1 similar sequence so I cannot find the correct gene ID required for enrichment.