Bayesian Approach For Polymorphism And Phenotype
Entering edit mode
8.5 years ago
robjohn7000 ▴ 110


I have datasets of polymorphisms (with a number of Gs) in DNA sequences from a number of clones for specific phenotypic traits. Clones have different numbers of Gs (denoted as L4,L5,L6).

Data example for wild-type(WT) phenotype:

            L4    L5    L6    
Clone_B1    2    2    3    WT phenotype
Clone_B2    1    4    5    WT phenotype
Clone_B3    2    2    4    WT phenotype
Clone_B4    4    3    3    WT phenotype
Clone_B5    2    2    2    WT phenotype

Data example for a phenotype under investigation:

            L4    L5    L6    
Clone_A1    2    3    3    Phenotype_M
Clone_A2    3    4    5    Phenotype_M
Clone_A3    1    2    4    Phenotype_M
Clone_A4    6    3    3    Phenotype_M
Clone_A5    4    1    2    Phenotype_M

Data explanation: in the WT phenotype data, 2 sequence reads (1st row and 1st column element in the matrix) of clone_B1 has 4 repeated Gs (L4), 3 sequence reads (1st row and 2nd column element in the matrix) of clone_B1 has 5 repeated Gs (L4) etc...

My questions is: Is it a good idea to use Bayesian algorithm to determine which of the Ls might be responsible in the 'phenotype under investigation' compared to the 'wild-type phenotype' ? Which Bayesian algorithms and R packages may be useful for this purpose?


statistics • 1.5k views
Entering edit mode
8.5 years ago
ewre ▴ 240

you can use Bayesian based method to do this job, but there are two questions for doing this:

  • can you explain the result? are there evidences which show that number of Gs has some biological relevance with the phenotype you are looking at?
  • Bayesian methods need training data to get a model, are there enough records to do this training process?
Entering edit mode

Thanks hanguangchun. I think the two questions are covered. I will be glad for advice on the appropriate tutorials and algorithm.

Entering edit mode
3.9 years ago
mmfansler ▴ 370

Nothing about this problem seems particularly suggestive of a Bayesian method. Unless, perhaps, one has in mind a prior model (say, preference for shorter motifs over longer ones?). Instead, what do come to mind are decision trees and linear discriminant analysis (LDA).

In a decision tree, one constructs splitting criteria to partition a set of multiclass objects using the features available. Typically, a metric such as expected information gain or Gini impurity is used to determine at each branching point what feature is the most effective at separating the classes, which is essentially the question asked here. One could use a metric like that to compare the features' effectiveness at distinguishing the WT from phenotype.

If one doesn't need to narrow down a single feature, but instead is just looking for a good classifier, LDA will construct linear combinations of the features that best separate the groups. However, such combinations might be harder to interpret.


Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6