Question: Classification of bacteria - bioinformatics
0
gravatar for Mohak
7 months ago by
Mohak0
Mohak0 wrote:

I want to do an enrichment analysis across all bacterial phyla/classes. For this I first need to know, of all the genomes I have in my database, how many of them are gammaproteobacteria, spirochetes etc. A list which gives the names of these bacteria, their type and their accession number. Is there a repository which can give me this information?

ADD COMMENTlink modified 7 months ago by Lina F100 • written 7 months ago by Mohak0
0
gravatar for Lina F
7 months ago by
Lina F100
Boston, MA
Lina F100 wrote:

I don't know if there is an easy ready-made answer for this question, but KEGG has a rudimentary "tree" of sorts:

http://www.genome.jp/kegg/catalog/org_list.html

You could download this html file and parse it and it will give you the links to genbank. Maybe Python's beautifulsoup module can help there.

If you know your way around NCBI, you could use the eUtils API to access different databases (i.e. the Taxonomy database to get the phylogenetic lineages).

Here are some notes on how to use eUtils with Python: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc110

Both of these solutions will probably need a bit of finagling and are not super elegant. Maybe someone else has a more sophisticated solution :-)

ADD COMMENTlink written 7 months ago by Lina F100

Thanks Lina! I wasn't aware of the KEGG tree. Beautifulsoup solves the problem!

And never used the eUtils API, but the biopython link seems useful. Will definitely try that out. Thanks a ton! :)

ADD REPLYlink written 7 months ago by Mohak0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1104 users visited in the last hour