Could anyone help me out with following question,
I want to perform Principal Component Analysis (PCA) on Genotype input data for SNPtest.
I know how to perform PCA on the type of genotype data where SNPs are just the genotypes (coded as 0, 1 or 2).
However, in the file format for SNPTEST, each SNP is represented as a set of three probabilities which correspond to the allele pairs AA,AB,BB. How can I perform PCA on this data?
I was thinking to apply some threshold, for example 0.9 and select genotypes that has probability >= 0.9. I would drop the SNPs that does not have any genotype with at least 0.9 probability. I am not sure if this approach is valid!
I would appreciate any suggestions on this! Thank you!
best regards, Krishna