Filter out genetic marker with Minor Allele Frequency
A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as

MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))


where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209 to 0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:

geno_filtered <- geno[, which(MAF > 0.05)].


2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?

3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?

Thanks

A simpler version of MAF function would be:

getMAF <- function(m) colMeans(m) / 2


Yes, it is possible to find out minor, we need to match frequency with counts, but doesn't always work:

# dummy genotype with 3 SNPs
geno1 <- matrix(c(rep(0, 100), rep(1, 100), rep(2, 100),
rep(0, 10), rep(1, 140), rep(2, 150),
rep(0, 150), rep(1, 140), rep(2, 10)
), ncol = 3)

# get maf
getMAF(geno1)
# [1] 0.5000000 0.7333333 0.2666667

# get counts
lapply(data.frame(geno1), table)
# $X1 # # 0 1 2 # 100 100 100 # #$X2
#
# 0   1   2
# 10 140 150
#
# \$X3
#
# 0   1   2
# 150 140  10


From this example, SNP1 is impossible to know as maf is 50%. SNP2 minor is 0, SNP3 is MAF 0.73, not really minor (so we need to flip 1 - 0.73), but from counts we can see it is 2.

Also, instead of re-inventing wheels, try to convert your data that matches with input for existing tools, for example R HardyWeinberg package.

Or convert to plink format, etc.

