Question: To get the name of the strains by searching assembly genome number GCF_
0
gravatar for horsedog
2.5 years ago by
horsedog60
horsedog60 wrote:

I have a bunch of refseq assembly genome number likeGCF_002514765.1,GCF_002485085.1,GCF_002201835.1,GCF_000593305.2,GCF_001887655.1,GCF_000194215.1,GCF_002098145.1,GCF_002807875.1

Now I want to use these to search which genome it is , for example, the first one is Escherichia coli strain MOD1-EC3823, I try to use efetch to achieve this, but seems it does not work, it says "urllib.error.HTTPError: HTTP Error 400: Bad Request" here is my python code:

from Bio import Entrez
Entrez.email = "hulala@gmail.com"
ID = open("assembly_ID").read()
handle = Entrez.efetch(db="assembly", id= ID, rettype="gb")
print(handle.read())

Does anyone have any idea?

efetch python ncbi • 952 views
ADD COMMENTlink modified 2.5 years ago by Joseph Hughes2.8k • written 2.5 years ago by horsedog60
0
gravatar for Joseph Hughes
2.5 years ago by
Joseph Hughes2.8k
Scotland, UK
Joseph Hughes2.8k wrote:

Re-writting the following query in python should get you what you want:

esearch -db assembly -query "GCF_002514765.1" | esummary | xtract -pattern DocumentSummary -element SpeciesName Sub_type Sub_value

The output is:

Escherichia coli    strain  MOD1-EC3823
ADD COMMENTlink written 2.5 years ago by Joseph Hughes2.8k

Hi , thanks , but it says "SyntaxError: invalid syntax" at Sub_value do you mean by replacing

ID = open("assembly_ID").read()
handle = Entrez.efetch(db="assembly", id= ID, rettype="gb")

by your code? but here the -query is not just one ID, there are thousands of

ADD REPLYlink written 2.5 years ago by horsedog60

you will need to do a loop in your python code to query each accession one at a time.

ADD REPLYlink written 2.5 years ago by Joseph Hughes2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1420 users visited in the last hour