Hi, I am a student currently studying to obtain my master's degree and for my dissertation we are thinking about using Python in order to find all of the possible combinations of genotypes for a set of 24 SNPs. I am very unfamiliar with coding, so I have no idea where to start or how we should go about getting the outcome that we are looking for. Any help or advice would be greatly appreciated! :)
The samples we have, have already been sequenced and amplified and we are trying to determine the hair and eye color of these individuals based on the 24 SNPs that are presented in the HIrisPlex system. I've already manually checked each SNP position, and due to the fact that we are working with ancient DNA and there was a low percentage of genome coverage to begin with, most of the SNPs are missing. If the SNP wasn't missing completely, then almost all of the present SNPs are present with low coverage, with only one of the two alleles being present. This causes problems, because we are unsure what exactly the genotype for a particular SNP could be, and therefore, this could dramatically change the probability of the estimated hair and eye color that HIrisPlex produces. For example, SNP 4 was missing completely in our second sample, so there are three possibilities that the genotype could be, either CC, CT, or TT. For SNPs that had one of the alleles present, then the possible genotypes could be either CC or CT, for example, if the present allele was a C (so we know for sure that there must be a C in the genotype). HIrisPlex provides the allele we need to be looking for in each SNP. For example, for SNP 2, the allele that HIrisPlex presents is A, so we need to provide the number of alleles present for that SNP (if the genotype is AA, then we would select 2 in HIrisPlex, if its AT, then we would select 1, and if its TT, then we would select 0). Since we aren't completely sure about the number of present alleles we have for each SNP since most of them were missing, then there are a ton of different combinations possible for all 24 SNPs.
Here is what I have so far for one of our samples (possible number of alleles for each SNP);
SNP 1 - 0, 1, 2 SNP 2 - 1, 2 SNP 3 - 1, 2 SNP 4 - 0, 1 SNP 5 - 0, 1, 2 SNP 6 - 0, 1, 2 SNP 7 - 0, 1 SNP 8 - 0, 1, 2 SNP 9 - 1, 2 SNP 10 - 0, 1, 2 SNP 11 - 0, 1 SNP 12 - 0, 1, 2 SNP 13 - 0, 1, 2 SNP 14 - 0, 1 SNP 15 - 1, 2 SNP 16 - 0, 1, 2 SNP 17 - 1, 2 SNP 18 - 0, 1 SNP 19 - 0, 1, 2 SNP 20 - 0 SNP 21 - 1 SNP 22 - 0, 1, 2 SNP 23 - 0, 1 SNP 24 - 0, 1, 2
We need to find out all of the possible combinations for all 24 SNPs, so we can obtain a range of probabilities for each hair and eye color so that we can estimate the predicted phenotype for each individual. Any help or advice anyone has on how we would go about doing this would be greatly appreciated! I hope this post makes sense, LOL. Please let me know if you have any questions. Thank you in advance! :)