Question: Associating Viral Sequence With Epidemiological Data
gravatar for Agapow
8.2 years ago by
London, UK
Agapow270 wrote:

The setup: I have a large number of sequences from a viral pathogen and the associated epidemiological data, collected during a major disease outbreak. I deeply suspect that the varied epidemiology seen across the outbreak (case severity and outcomes, transmission rate, etc.) is the result of changes in the viral sequence.

The question: So, how do I best correlate these epidemiological data with sequence data? In the crudest sense, how can I point at a SNP and say "this is associated with more severe cases"?


  • I'm concerned about phylogenetic inertia, i.e. false correlations caused by evolutionary relationship. A given sequence change may correlate with increased fatality because it was fixed in the lineage that infected a weakened group of hosts.

  • Some characteristics which are technically non-heritable will behave as heritable, e.g. location.

Solutions I've considered:

  • Tools from GWAS studies or similar: apart from the possible overkill of using these on such a short genome, I don't know of any GWAS tools that deal with the inertia problem..

  • Comparative analysis with independent contrasts: would be the obvious choice if I was dealing with solely character data. I could hack an suitable dataset together, say by treating a SNP loci as a character, but it seems ugly. Also, the state of useful software here is not good.

  • Selection: will tell me what sites are being selected for but not what might be correlated with that selection.

  • Compare controls: is something I've done before, but in this case it seems that deciding what to control for is pre-emptively deciding what won't correlate.

evolution association snp • 1.5k views
ADD COMMENTlink modified 8.2 years ago by David Quigley11k • written 8.2 years ago by Agapow270

Exactly what kind of epidemiological data do you have, e.g. is it already aggregated by viral sequence, or do you have individual case data at your disposal?

ADD REPLYlink written 8.2 years ago by Meredith0

Individual case data, dates, outcomes, the whole paella.

ADD REPLYlink written 8.2 years ago by Agapow270
gravatar for David Quigley
8.2 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

You should look at the literature on evolution of tumors, e.g. Navin Nature 2011, Bozic PNAS 2010, Wood Science 2007. Although tumors are often called clonal populations, a more sophisticated model of tumor evolution starts with a single aberrant cell producing a heterogeneous population of offspring which mutate independently within the overall tumor mass. The problem of determining which of many somatic mutations is a strong candidate to be causal (a.k.a. a driver mutation) and which is a bystander or passenger mutation is the same as your inertia problem.

I would guess that creating a phylogenetic tree based off of the sequence alterations would be crucial for establishing causality candidates, but this isn't really my area of expertise. You might structure the question as a set of regressions, asking whether a given alteration is associated with your phenotype and then looking for progenitor alterations highest up on the tree. If you're using a regression-based statistic you can control for biases such as location. I would consult a card-carrying molecular epidemiologist to avoid re-inventing the wheel.

ADD COMMENTlink written 8.2 years ago by David Quigley11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1847 users visited in the last hour