Question

Forum:Uniprotkb Accession Number Format To Be Extended To 10 Characters

4

Entering edit mode

11.6 years ago

Elisabeth Gasteiger ★ 2.4k

UniProtKB accession numbers currently consist of 6 alphanumerical characters. With our projected growth of UniProtKB, we expect to use up all accession numbers of this format in 2014. We will therefore extend the format to 10 alphanumerical characters.

Read more here: http://www.uniprot.org/changes and contact the UniProt helpdesk with any comments you might have.

uniprot web-service • 3.3k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 11.6 years ago by Elisabeth Gasteiger ★ 2.4k

score 2 · Answer 1 · 2013-11-18

2

Entering edit mode

11.6 years ago

Michael 55k

Reminds me of transition to from IPv4 to IPv6 ;) That would give 2.611467e+13 new IDs following the new scheme. If we assume there a 10 million species on earth and each contributes on average 20,000 proteins (total 2e+12), then these numbers should be sufficient.

I'd expect a question on BioStar like: "How can I map from old to new UniProtKB accession numbers?", but if I understand correctly both short and long versions should co-exist and the already assigned ANs should not be changed, and new ANs only assigned to new proteins? Further, is it a problem that some new IDs can have valid or existing old ANs as prefixes according to your definition?

ADD COMMENT • link 11.6 years ago by Michael 55k

0

Entering edit mode

Regarding the mapping: yes, you are correct, short and long versions will co-exist. New ACs are assigned to new entries, and already assigned ACs usually do not change. If they do need to change, this will be handled like with the current AC scheme: http://www.uniprot.org/manual/accession_numbers .

ADD REPLY • link 11.6 years ago by Elisabeth Gasteiger ★ 2.4k