Imagine 10 alleles (A-J), each with a 10% population frequency at one locus.
- If you know that your sibling had alleles A,B, what are the probabilities of observing any allele at your position in the pedigree.
- If your nephew (on your mother's side) has alleles C,D what are the probabilities of observing any allele at your position in the pedigree.
- If your nephew (on your mother's side) has alleles C,D, and your aunt (on your father's side) has allele C,E, what are the probabilities of observing any allele at your position in the pedigree.
Please suggest the fastest method of solving this problem for any arbitrary combination of individuals in a pedigree. One approach would be to catalog all of the observed alleles in the family and re normalize the set back to 1, using degree of separation to weight the contribution of an observation.
I think the point of homework is that you do it yourself, not post on Biostars for people to do it for you.
Thanks for the snide and entirely unhelpful remark, but this is not homework. I am building this algorithm and seeking input from others with experience. Or is that also not the point of Biostars?
You were lucky I didn't close the question as it looks like population genetics to me rather than bioinformatics. Biostars get a number of posts with phrasing that is clearly lifted from people's homework assignments, and there was little here to differentiate it from one of those posts.
If you're building the algorithm why ask for the fastest solution, the assumption here that the problem is solved? Perhaps you would get more benefit from the exercise by sharing your work so far.
Then I suppose that is a compliment to my meticulous question phrasing which appears to be from a textbook :)
I am seeking a fast solution (not essential that it be the fastest) since this algorithm must be run against a large number of scenarios. See original post above edited to suggest one route.
So what part of the algorithm did you get stuck on? How long does it take to run? If you need a faster computer, perhaps you can run part of the samples on the Amazon Cloud.
The current implementation runs for close to 20 hours (and growing with incoming data). Am unable to send the data outside of the organization.