Database Of Host And Pathogen Pairs
4
6
Entering edit mode
12.2 years ago
Will 4.5k

I'm looking to do a project on predicting which bacteria colonize a particular host and determine the genomic features which determine these interactions.

Does anyone know of a good database which annotates any known interactions? I know I could pull the cross-organism interactions from a PPI database like BIND but that only gives a handful of examples and seems to be over restrictive.

interaction • 5.5k views
ADD COMMENT
5
Entering edit mode
12.2 years ago

See how @rdmpage built a database of host-pathogens using genbank: http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html

Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence AF131710 from the platyhelminth Ligophorus mugilinus, which records the host as the Flathead mullet Mugil cephalus).

Edit: another one: http://www.phi-base.org/ pathogen-host interactions database.

This database contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions. Information is also given on the target sites of some anti-infective chemistries.

ADD COMMENT
4
Entering edit mode

I've been meaning to take this project beyond the blog post stage. If there's enough interest I could look at creating a web site and services around the host-parasite data in GenBank.

ADD REPLY
1
Entering edit mode

Big (depending on taxonomic scope). I built the visualisations from a subset of GenBank (mainly eukaryote non-EST sequences). If you ask GenBank how many sequences have the "host" field http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=host&retmax=10 today it has 3,466,914

ADD REPLY
0
Entering edit mode

excellent ... wouldn't have thought to look in Genbank!

ADD REPLY
0
Entering edit mode

@roderic: I've already got a dirty-python script to extract the data. Do you remember roughly how many associations you found? I'm just trying to get an idea for how large this symbiome is going to be.

ADD REPLY
0
Entering edit mode

Just discovered that the search in my previous comment will search for "host" anywhere in the sequence record, so it will return sequences without a "host" field but with host in, say, the title of the article that published the sequence. So the figure of number of hosts will be an overestimate.

ADD REPLY
0
Entering edit mode

yeah, parsing through now ... looks to be ~1.7 million triples (genbank-record, host, symbiote)

ADD REPLY
3
Entering edit mode
12.2 years ago
Hamish ★ 3.2k

The pathogen specific resources might be a useful starting point. For example:

If you are including viruses then UniProtKB may be a useful source, since it details organism/host for viruses. Sadly they don't seem to have included other organism/host relationships.

Other possible sources that come to mind are:

  • metabolic pathway databases. For example KEGG PATHWAY has a set of pathways related to infectious disease that could be useful.
  • microarray expression databases. For example ArrayExpress contains details of experiments looking for changes in gene expression related to various disease states (try a search for terms like 'infected').

While I suspect that these will suffer from similar limitations to the PPI data they are worth looking at. Additional pairings from the analysis of the INSDC databases, suggested Roderic, will extend coverage. As will text-mining of the literature.

ADD COMMENT
2
Entering edit mode
12.2 years ago

there are over 9000 experimentally confirmed host-pathogen PPIs from bacteria available from public PPI databases. if you include viruses as well, that number is much higher. several public (and published) resources cull unique HP-PPI pairs from the broader databases, including PATRIC (our group), HPIdb, PHI-Base, and more. also check out the PSICQUIC Web interface at EBI which lets you query many public dbs yourself. hope that helps!

ADD COMMENT
0
Entering edit mode

I don't really need the PPI level info, I just need the organism interaction level. When I looked through BIND and NCBI's repository I only found ~500 unique host-pathogen associations.

ADD REPLY
2
Entering edit mode
12.2 years ago

You could also try EnvDB, a "database that aims to provide the most complete census to-date of the environmental distribution of prokaryotes". Under "environments" select "host associated"

ADD COMMENT

Login before adding your answer.

Traffic: 1693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6