Convert huge list of accession numbers to GI numbers
0
0
Entering edit mode
4.4 years ago
hazirliver ▴ 10

Hello! I have a huge list of accession numbers (a little bit more than 1 million) and i need to get relevant gi numbers. Are there any ways to do this?

ncbi GI gi numbers accession numbers • 2.0k views
ADD COMMENT
1
Entering edit mode

Please do not use gi numbers they have been deprecated for end-user use by NCBI for almost 2 years now. Stay with accession numbers where you can.

That said EntrezDirect should be able to get this information. It seems to be not behaving at the moment though.

ADD REPLY
0
Entering edit mode

I know it, but i can try to apply Koonin's pipepline from 2019 article in which they use gi numbers to define sequnces

ADD REPLY
0
Entering edit mode

Which pipeline are you referring to? Perhaps it could be modified to use accession numbers?

ADD REPLY
0
Entering edit mode

https://www.nature.com/articles/s41596-019-0211-1 They use both accession numbers and gi numbers. To be more specific there are "GeneratedID"s, but in ProtocolFiles (Vicinity.faa) there are lines like ">gi|1000270263|gb|AAD36848.1| AAD36848.1 acetylornithine aminotransferase [Thermotoga maritima MSB8]". An additional problem is that in GenBank accession numbers do not match the gi numbers in this article. Even if you look in the revision history in GenBank there will be other numbers.

ADD REPLY
1
Entering edit mode

Using Entrezdirect (linked above) you can get the gi where possible:

$ esearch -db protein -query "CAA62188" | efetch -format gi
1212992
$ esearch -db protein -query "AAD36848" | efetch -format gi
4982364

As for the numbers not matching that is interesting (as above). Since authors of the pipeline you link are at NCBI you should make them aware of the discrepancy and also suggest that they may want to update their pipeline to use accessions instead of gi.

ADD REPLY
0
Entering edit mode

Thanks for Entrezdirect! I think that their "gi"s aren't gi numbers in GenBank because each accession number doesn't match gi in whole file

ADD REPLY
0
Entering edit mode

Hmm. I get the same record for titin if I use the following two URL's. One is for accession and other is for gi.

https://www.ncbi.nlm.nih.gov/protein/1212992
https://www.ncbi.nlm.nih.gov/protein/CAA62188.1

ADD REPLY
0
Entering edit mode

In my previous post I meant the file from the article.

ADD REPLY

Login before adding your answer.

Traffic: 2054 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6