For the purposes of updating some tools and making them more widely available, I'd like to know if formal rules have ever been defined for what characters are allowed for naming genomic annotations.
For instance, I have seen some dimers get glued together with two colons (
geneA::geneB) in annotation tables. It is a strict subset of ASCII? Do researchers in China or France, say, use extended or other character sets to name things?
Not really sure what's out there. Thanks for any pointers to specification documents, specifically, if any such exist.