Hi all, first of all I apologise in case the following has been asked in previous posts but I am not able to find the solution to my problem.
I have a data frame with a list of SNPs, their locations and the information of the Reference (REF) and Alternate (ALT) alleles. In addition I have information about the phased genotypes for a list of various individuals.
SNP CHR POS REF ALT ID1 ID2 ID3 rs2754554 1 8656 A C 0|1 0|0 1|1 rs1111786 16 975544 T A 0|0 0|1 1|0 rs986355 7 75987 G T 1|1 0|1 1|1 rs 2256743 21 442324 G C 1|0 0|1 0|1
In the example I have only 4 SNPs and 3 individuals but the list is much larger. I would like to modify the genotype information to be replaced with the corresponding alleles based on the information of the REF and ALT columns:
SNP CHR POS REF ALT ID1 ID2 ID3 rs2754554 1 8656 A C A|C A|A C|C rs1111786 16 975544 T A T|T T|A A|T rs986355 7 75987 G T T|T G|T T|T rs2256743 21 442324 G C C|G G|C G|C
The output is based on my understanding that if it is 0 it means equal to reference while 1 equals to alternate. Any help highly appreciated.