Question: Database Of Host And Pathogen Pairs
gravatar for Will
7.4 years ago by
United States
Will4.5k wrote:

I'm looking to do a project on predicting which bacteria colonize a particular host and determine the genomic features which determine these interactions.

Does anyone know of a good database which annotates any known interactions? I know I could pull the cross-organism interactions from a PPI database like BIND but that only gives a handful of examples and seems to be over restrictive.

interaction • 3.9k views
ADD COMMENTlink written 7.4 years ago by Will4.5k
gravatar for Pierre Lindenbaum
7.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

See how @rdmpage built a database of host-pathogens using genbank:

Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence AF131710 from the platyhelminth Ligophorus mugilinus, which records the host as the Flathead mullet Mugil cephalus).

Edit: another one: pathogen-host interactions database.

This database contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions. Information is also given on the target sites of some anti-infective chemistries.

ADD COMMENTlink modified 4.3 years ago by Michael Dondrup46k • written 7.4 years ago by Pierre Lindenbaum121k

I've been meaning to take this project beyond the blog post stage. If there's enough interest I could look at creating a web site and services around the host-parasite data in GenBank.

ADD REPLYlink written 7.4 years ago by Roderic Page390

Big (depending on taxonomic scope). I built the visualisations from a subset of GenBank (mainly eukaryote non-EST sequences). If you ask GenBank how many sequences have the "host" field today it has 3,466,914

ADD REPLYlink written 7.4 years ago by Roderic Page390

excellent ... wouldn't have thought to look in Genbank!

ADD REPLYlink written 7.4 years ago by Will4.5k

@roderic: I've already got a dirty-python script to extract the data. Do you remember roughly how many associations you found? I'm just trying to get an idea for how large this symbiome is going to be.

ADD REPLYlink written 7.4 years ago by Will4.5k

Just discovered that the search in my previous comment will search for "host" anywhere in the sequence record, so it will return sequences without a "host" field but with host in, say, the title of the article that published the sequence. So the figure of number of hosts will be an overestimate.

ADD REPLYlink written 7.4 years ago by Roderic Page390

yeah, parsing through now ... looks to be ~1.7 million triples (genbank-record, host, symbiote)

ADD REPLYlink written 7.4 years ago by Will4.5k
gravatar for Hamish
7.4 years ago by
Hamish3.1k wrote:

The pathogen specific resources might be a useful starting point. For example:

If you are including viruses then UniProtKB may be a useful source, since it details organism/host for viruses. Sadly they don't seem to have included other organism/host relationships.

Other possible sources that come to mind are:

  • metabolic pathway databases. For example KEGG PATHWAY has a set of pathways related to infectious disease that could be useful.
  • microarray expression databases. For example ArrayExpress contains details of experiments looking for changes in gene expression related to various disease states (try a search for terms like 'infected').

While I suspect that these will suffer from similar limitations to the PPI data they are worth looking at. Additional pairings from the analysis of the INSDC databases, suggested Roderic, will extend coverage. As will text-mining of the literature.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Hamish3.1k
gravatar for Behindtherabbit
7.4 years ago by
Blacksburg, VA USA
Behindtherabbit60 wrote:

there are over 9000 experimentally confirmed host-pathogen PPIs from bacteria available from public PPI databases. if you include viruses as well, that number is much higher. several public (and published) resources cull unique HP-PPI pairs from the broader databases, including PATRIC (our group), HPIdb, PHI-Base, and more. also check out the PSICQUIC Web interface at EBI which lets you query many public dbs yourself. hope that helps!

ADD COMMENTlink written 7.4 years ago by Behindtherabbit60

I don't really need the PPI level info, I just need the organism interaction level. When I looked through BIND and NCBI's repository I only found ~500 unique host-pathogen associations.

ADD REPLYlink written 7.4 years ago by Will4.5k
gravatar for Casey Bergman
7.4 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

You could also try EnvDB, a "database that aims to provide the most complete census to-date of the environmental distribution of prokaryotes". Under "environments" select "host associated"

ADD COMMENTlink written 7.4 years ago by Casey Bergman18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 601 users visited in the last hour