UniProtKB accession numbers currently consist of 6 alphanumerical characters. With our projected growth of UniProtKB, we expect to use up all accession numbers of this format in 2014. We will therefore extend the format to 10 alphanumerical characters.
Reminds me of transition to from IPv4 to IPv6 ;)
That would give 2.611467e+13 new IDs following the new scheme. If we assume there a 10 million species on earth and each contributes on average 20,000 proteins (total 2e+12), then these numbers should be sufficient.
I'd expect a question on BioStar like: "How can I map from old to new UniProtKB accession numbers?", but if I understand correctly both short and long versions should co-exist and the already assigned ANs should not be changed, and new ANs only assigned to new proteins?
Further, is it a problem that some new IDs can have valid or existing old ANs as prefixes according to your definition?
Regarding the mapping: yes, you are correct, short and long versions will co-exist. New ACs are assigned to new entries, and already assigned ACs usually do not change. If they do need to change, this will be handled like with the current AC scheme:
http://www.uniprot.org/manual/accession_numbers .
Regarding the mapping: yes, you are correct, short and long versions will co-exist. New ACs are assigned to new entries, and already assigned ACs usually do not change. If they do need to change, this will be handled like with the current AC scheme: http://www.uniprot.org/manual/accession_numbers .