Hi, I'm trying to calculate the genotype frequencies of a couple of CYP2C19 alleles (rs4244285 + rs12248560) using the 1000 genomes data.
However, I'm confused by the "forward strand" naming convention.
I want to determine the % of people with *2 / *17 but every list I generate has the same person repeated twice it seems.
For eg. I get this on a list for re12248560:
- HG00099 (F) A|A
- HG00099 (F) T|C
The same individual it seems but which is it? A|T or A|C or A|A? How can I tell what the "typical" genotype would be?
I want to simply note the diplotype.
Many thanks for your help!
A description of how you are generating your list (even a code example) would help a lot in trying to figure out what might be going on