Convert huge list of accession numbers to GI numbers
0
0
Entering edit mode
24 months ago
hazirliver ▴ 10

Hello! I have a huge list of accession numbers (a little bit more than 1 million) and i need to get relevant gi numbers. Are there any ways to do this?

ncbi GI gi numbers accession numbers • 934 views
1
Entering edit mode

Please do not use gi numbers they have been deprecated for end-user use by NCBI for almost 2 years now. Stay with accession numbers where you can.

That said EntrezDirect should be able to get this information. It seems to be not behaving at the moment though.

0
Entering edit mode

I know it, but i can try to apply Koonin's pipepline from 2019 article in which they use gi numbers to define sequnces

0
Entering edit mode

Which pipeline are you referring to? Perhaps it could be modified to use accession numbers?

0
Entering edit mode

https://www.nature.com/articles/s41596-019-0211-1 They use both accession numbers and gi numbers. To be more specific there are "GeneratedID"s, but in ProtocolFiles (Vicinity.faa) there are lines like ">gi|1000270263|gb|AAD36848.1| AAD36848.1 acetylornithine aminotransferase [Thermotoga maritima MSB8]". An additional problem is that in GenBank accession numbers do not match the gi numbers in this article. Even if you look in the revision history in GenBank there will be other numbers.

1
Entering edit mode

Using Entrezdirect (linked above) you can get the gi where possible:

$esearch -db protein -query "CAA62188" | efetch -format gi 1212992$ esearch -db protein -query "AAD36848" | efetch -format gi
4982364


As for the numbers not matching that is interesting (as above). Since authors of the pipeline you link are at NCBI you should make them aware of the discrepancy and also suggest that they may want to update their pipeline to use accessions instead of gi.

0
Entering edit mode

Thanks for Entrezdirect! I think that their "gi"s aren't gi numbers in GenBank because each accession number doesn't match gi in whole file

0
Entering edit mode

Hmm. I get the same record for titin if I use the following two URL's. One is for accession and other is for gi.

0
Entering edit mode

In my previous post I meant the file from the article.