I have a genotype matrix (near 3000 animal with 50 000 SNP in columns). It's coded as 0/1/2 or NA. I want to convert this into plink format in form allelic format for example 0 to 0 0, 1 to 1 1 and 2 to 2 2. this is a format for PLINK for quaity control my data, What's the best way to do this in R?
Why should 1 be 11 and 2 be 22? You currently have the data in 012 format, which relates to:
- 0 (zero) minor alleles (ref)
- 1 minor allele (het)
- 2 minor alleles (hom)
To produce PLINK data in 012 format, you first have to recode it using the 012 flag (see HERE), i.e., within Plink itself. So, from where did you get the file? You (or the source from where you got it) should already have the data in the format that you require.
In Plink PED format, genotypes can be encode numerically or as characters, as follows:
So, as you can see, in order to connect the 012 format to the original PED format, you need mapping information in order to understand which allele (ACGT or 1234) was the minor allele and which was the major. Without that mapping, you cannot convert back. You need that extra information.
...of course, as I have already mentioned, 012 format is produced from PED (or BED) in Plink itself. So, either you or your source has the original file that you need.