Counting Organisms Belonging To A Taxon
2
2
Entering edit mode
12.5 years ago
Igorfobia ▴ 20

Hi,

I'd like to get the following information about the genomes available at ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

For each taxonomic rank r (species, genus, etc.) and for each taxid t at rank r, I'd like to know how many organisms have t as ancestor. Actually, in the end I need just this aggregated information: for each feasible feasible n, I'd like to obtain the number of NCBI taxid at rank r having n organisms as descendent in the tree.

With the term "organism", I refer to a complete genome contained in the repository above - that is one for each subfolder. In this way, I will also to take into account when there are two or more organisms whose genome is available and are associated to the same species.

I hope my terminology was not too bad...

Thanks in advance!

taxonomy bacteria • 2.4k views
ADD COMMENT
2
Entering edit mode
12.5 years ago
Science_Robot ★ 1.1k

You can get this information from the NCBI Taxonomy database: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

The data is in a tree format. The files you will need are names and nodes. Names is a flat text file containing a name and its corresponding taxid. Nodes is a flat text file containing and taxid and its parent. You can load all of this into an associative array:

{ 'id': [ child_1, child_i, child_n ] }

and count the members given a taxid.

ADD COMMENT
1
Entering edit mode
12.5 years ago
Pablacious ▴ 620

You could also use EBI's Ontology LookUp Service

http://www.ebi.ac.uk/ontology-lookup/

to access the NEWT Ontology, which is the same NCBI Taxonomy plus a few addendums from UniProt, but "ontologized". Using the ontology-lookup service's web service you can ask for the children or parents of a particular term. These are the signatures of the methods you could use (in Java):

public Map getTermParents(String termId, String ontologyName);
public Map getTermChildren(String termId, String ontologyName, int distance, int[] relationTypes);
public Map getTermRelations(String termId, String ontologyName);
public Map getChildrenFromRoot(String termId, String ontologyName, Vector childrenIds);

The web service is exposed through a WSDL file, so if you don't like Java (I think they provide a built Java client), you can consume it from your favourite language. And you avoid all the parsing :-). I use myself the OLS web service for other tasks, so I would definitely recommend it.

ADD COMMENT

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6