A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as
MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))
where the geno is a matrix object for the genotype data. The calculated MAF ranged from
0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:
geno_filtered <- geno[, which(MAF > 0.05)].
2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?
3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?