I am busy extracting SNP data from various datasets that have been published. Those that are provided in plink format and in matrices with genotypes coded as 0 1 2 are absolutely fine. However, some of the matrices have genotypes coded as nucleotides (A G T C), and I am struggling to find a conversion tool for these that works. In many occasions it is not viable for me to manually convert these datasets into plink format because I don't always have all the necessary data (e.g., often there is just a matrix with the genotypes and no other information).
Has anyone had any success with a package, or otherwise does anyone know a function that I could use to code the A G C T matrix into 0 1 2?
Thanks in advance!