Question: Search using Entrez and return accession numbers (not GI)
0
gravatar for Cricket
2.6 years ago by
Cricket0
Cricket0 wrote:

I am trying to use Biopython (Entrez) with search terms that will return the accession number (and not the GI*).

Here is a tiny excerpt of my code:

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']
print(gi_numbers)

'745369752', '910228862', '187736741', '802098270', '802098269', '802098267', '387610477', '544579032', '544574430', '215485161', '749295052', '387823261', '387605479', '641687520', '641682562', '594009615', '557270520', '313848522', '309700213', '284919779', '215263233', '544345556', '544340954', '144661', '51773702', '202957457', '202957451', '172051323'

What slice of magic am I missing? Thank you for your assistance.

*especially since they are phasing out GI numbers

ADD COMMENTlink modified 2.6 years ago by Alexander Goncearenco20 • written 2.6 years ago by Cricket0
2
gravatar for Alexander Goncearenco
2.6 years ago by
USA/Bethesda/NIH
Alexander Goncearenco20 wrote:

Eutils esearch does not return complete records. You will need efetch for that. Continuing your lines of code:

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']

h = Entrez.efetch(db="nucleotide", id=gi_numbers, rettype="acc")
h.read().splitlines()
h.close()

['HF572917.2', 'NZ_HF572917.1', 'NC_010558.1', 'NZ_HG941720.1', 'NZ_HG941719.1', 'NZ_HG941718.1', 'NC_017633.1', 'NC_022371.1', 'NC_022370.1', 'NC_011601.1', 'NZ_HG738867.1', 'NC_012892.2', 'NC_017626.1', 'HG941719.1', 'HG941718.1', 'HG941720.1', 'HG738867.1', 'AM946981.2', 'FN649414.1', 'FN554766.1', 'FM180568.1', 'HG428756.1', 'HG428755.1', 'M37402.1', 'AJ304858.2', 'FM206294.1', 'FM206293.1', 'AM886293.1', '']

Alternatively, install eutils and run:

$ esearch -db nuccore -query "(Escherichia coli[organism]) AND (complete genome[keyword])" |efetch -mode text -format acc

HF572917.2 NZ_HF572917.1 NC_010558.1 NZ_HG941720.1 NZ_HG941719.1 NZ_HG941718.1 NC_017633.1 NC_022371.1 NC_022370.1 NC_011601.1 NZ_HG738867.1 NC_012892.2 NC_017626.1 HG941719.1 HG941718.1 HG941720.1 HG738867.1 AM946981.2 FN649414.1 FN554766.1 FM180568.1 HG428756.1 HG428755.1 M37402.1 AJ304858.2 FM206294.1 FM206293.1 AM886293.1

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Alexander Goncearenco20

that works great! However, with NCBI getting rid of GI numbers soon, this will stop working soon, right?

ADD REPLYlink written 2.6 years ago by Whetting1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour