Question: Retrieve Genbanks From Taxid
0
gravatar for JJK
7.3 years ago by
JJK50
Netherlands
JJK50 wrote:

I know the manual way of doing it but is there an automated way of retrieving all genbank files of all species belonging to a taxid like 51291?

It should be something like first retrieve all taxonomy ids of the strains belonging to this superclass taxid and then find the genbank file belonging to that taxid. But so far I couldnt really find a proper way of doing that...

Or retrieving the NC_XXXX id's would be sufficient as well as I already have a genbank download script.

With the use of efetch I know I can retrieve the partent id and lineage. However I cannot find an option to find the childs yet.

handle = Entrez.efetch(db="Taxonomy", id=taxId, retmode="xml")

Some extra code I am working on now, I did some == statements to direct the flow of the program.

def get_TaxonomyChild():
 handle = Entrez.esearch(db="Taxonomy", term="Chlamydiales [subtree] AND species[rank]", RetMax="100000")
 record = Entrez.read(handle)
 IdListOrganisms = record["IdList"]
 for organism in IdListOrganisms:
      if organism == "813":
         handle = Entrez.esearch(db="Taxonomy", term="txid"+organism+"[Organism]", RetMax="100000")
         record = Entrez.read(handle)
         StrainList = record["IdList"]
         for Strain in StrainList:
             if Strain == "471472":
                 print Strain
taxonomy biopython genbank • 3.6k views
ADD COMMENTlink modified 7.3 years ago by Peter5.8k • written 7.3 years ago by JJK50
1

highly similar: http://www.biostars.org/post/show/18706

ADD REPLYlink written 7.3 years ago by Pierre Lindenbaum123k
0
gravatar for Damian Kao
7.3 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

According to the taxonomy FAQ (http://www.ncbi.nlm.nih.gov/books/NBK54428/), you can find all species belonging to a taxa by:

How do I find all of the species in GenBank that belong to a particular group?

You can use Entrez queries to find taxa of a particular rank in a given lineage, e.g.:

Amphibia[subtree] AND species[rank]

You can restrict the output of this list to species with formal Linnaean binomial names:

Amphibia[subtree] AND species[rank] AND specified[prop]

To download the list of species names (1) click ‘Send to’ (2) select ‘File’ (3) switch Format to ‘Taxon name’ and (4) click ‘Create File’. This will create a file named “taxonomy_result” in your download directory.

So for your example, you can do a search for: Chlamydiales [subtree] AND species[rank]

Download all the specie names. Then use the list of specie names to EFetch all sequences belonging to the species.

ADD COMMENTlink written 7.3 years ago by Damian Kao15k

I am able to get all taxonomy IDs from every species via your method indeed. But one species can consist of many strains and somehow I am unable to retrieve the data.

ADD REPLYlink written 7.3 years ago by JJK50
0
gravatar for Peter
7.3 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

What is your question? If you just want to get the GenBank files for a given taxonomy ID, then (more or less as you showed) you get this with a single ESearch term like taxid12345[orgn] - see also http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

There can be complications when you have lots of records and want to download them all (e.g. network errors), see this thread: http://lists.open-bio.org/pipermail/biopython/2012-April/007943.html

ADD COMMENTlink written 7.3 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2361 users visited in the last hour