Question

Prediction versus Association: how to evaluate associated traits?

1

Entering edit mode

9.3 years ago

Iryna Nikolayeva ▴ 30

Hi!
I wonder how to evaluate the performance of the tools that associate a variable (example: SNP, gene...) with a phenotype. In a few papers [1,2], people would wonder whether variables, that have been significantly associated with a phenotype improve prediction of that phenotype. Is this a correct way to evaluate the associated traits? What does it mean if an associated trait doesn't improve prediction of the phenotype?

I have also a few related questions to the subject:
1) Is there a difference in "properties" of variables that are good for prediction and those that are good for association (example : variability in between patients)?

2) Why would we sometimes go for techniques that associate a variable (example: SNP, gene) to an outcome variable (example: phenotype), rather than a technique that improves prediction?

Thank you a lot in advance for your responses!

References:

[1]Dufresne, L. et al. (2014). Pathway analysis for genetic association studies: to do, or not to do? That is the question. BMC Proceedings doi:10.1186/1753-6561-8-S1-S103,

[2]Staiger, C., Cadot, S., Kooter, R., Dittrich, M., Müller, T., Klau, G. W., & Wessels, L. F. a. (2012). A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer. PloS One, 7(4), e34796. doi:10.1371/journal.pone.0034796)

SNP gene pathway analysis network analysis GWAS • 4.1k views

ADD COMMENT • link updated 9.3 years ago by Devon Ryan 104k • written 9.3 years ago by Iryna Nikolayeva ▴ 30

score 0 · Answer 1 · 2015-01-05

If the association isn't predictive, then either (1) the effect is so small that you have to wonder how relevant it is, (2) the prediction method isn't appropriate for how the variant actually leads to the phenotype, or (3) it's just a spurious finding.

1) To my mind no, but I would defer to others here.

2) In order to test how predictive a finding is you need a separate dataset (or a large enough initial dataset that you can subsample and still have enough power). This can be a deal-breaker. There's also the fact that we don't usually care about the prediction part. In many disease cases, patients are already diagnosed...so there's nothing to predict there. Rather, if we can find things associated then we can develop a treatment that targets that change and will hopefully alter the patient's phenotype. Of course, if you want to do screening or to determine response to a treatment then prediction is highly relevant.