Hi,

Could anyone help me out with following question,

I want to perform Principal Component Analysis (PCA) on Genotype input data for SNPtest.

I know how to perform PCA on the type of genotype data where SNPs are just the genotypes (coded as 0, 1 or 2).

However, in the file format for SNPTEST, each SNP is represented as a set of three probabilities which correspond to the allele pairs AA,AB,BB. How can I perform PCA on this data?

I was thinking to apply some threshold, for example 0.9 and select genotypes that has probability >= 0.9. I would drop the SNPs that does not have any genotype with at least 0.9 probability. I am not sure if this approach is valid!

I would appreciate any suggestions on this! Thank you!

best regards, Krishna

I've never tried this and I won't pretend to be a GWAS expert, but I would try to just run the PCA with the data as it is. You might need to "tidy" the data into the following format:

I would presume that that would produce reasonable PCA results.

Thank you Devon!

I am going to try that!