I have genotypes for two inbred mouse strains on the original MUGA array from GeneSeek.
The data comes back with genotypes either A, H, or B at 1 of roughly 8000 markers on the array.
The genotype is labeled A or B according to the Illumina Top/Bot Strand system (https://www.illumina.com/documents/products/technotes/technote_topbot.pdf) and NOT based on the strain. i.e. Strain P is not always labeled A, and Strain Q is not always labeled B, the system depends on the actual alleles present at each individual SNP.
The genotype is labeled H if the individual is heterozygous at that marker
I have these files:
- A/H/B genotype calls, for each marker, from my data set.
- A file with the alleles present at each individual SNP at each marker for the strains used.
If my mouse strains are P and Q. I want to automate the assignment of P/PQ/Q rather than A/H/B in the genotype file. Is there a program available to do this?
i.e. I need to process this information:
- From file 2-- For Marker X, Strain P allele = T, Strain Q allele = G and this means, in File 1 at marker (X) A = Strain P and B = Strain Q. Or conversely: For Marker Y, Strain P allele = C, Strain Q allele = A and this means in File 1 at marker (Y) A = Strain Q and B = Strain P.
Thanks!