I have datasets of polymorphisms (with a number of Gs) in DNA sequences from a number of clones for specific phenotypic traits. Clones have different numbers of Gs (denoted as L4,L5,L6).
Data example for wild-type(WT) phenotype:
L4 L5 L6 Clone_B1 2 2 3 WT phenotype Clone_B2 1 4 5 WT phenotype Clone_B3 2 2 4 WT phenotype Clone_B4 4 3 3 WT phenotype Clone_B5 2 2 2 WT phenotype
Data example for a phenotype under investigation:
L4 L5 L6 Clone_A1 2 3 3 Phenotype_M Clone_A2 3 4 5 Phenotype_M Clone_A3 1 2 4 Phenotype_M Clone_A4 6 3 3 Phenotype_M Clone_A5 4 1 2 Phenotype_M
Data explanation: in the WT phenotype data, 2 sequence reads (1st row and 1st column element in the matrix) of clone_B1 has 4 repeated Gs (L4), 3 sequence reads (1st row and 2nd column element in the matrix) of clone_B1 has 5 repeated Gs (L4) etc...
My questions is: Is it a good idea to use Bayesian algorithm to determine which of the Ls might be responsible in the 'phenotype under investigation' compared to the 'wild-type phenotype' ? Which Bayesian algorithms and R packages may be useful for this purpose?