A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as
MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))
where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209
to 0.94912791
. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:
geno_filtered <- geno[, which(MAF > 0.05)].
2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?
3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?
Thanks
Forum
discussion and not aQuestion
?Thanks for calling my attention to this. You mean it is good to post my question on question forum, right? May be that is why I dont get much response
Your question was posted as a
Forum
type post, not aQuestion
type post. See Biostar Forum Posting Guidelines for a primer on the types of posts.You did get a response - how actively you follow up also determines the quality and frequency of help you'll get. Please read http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 to better understand how open science forums work.