Question: Filter out genetic marker with Minor Allele Frequency
gravatar for mab658
11 weeks ago by
mab65820 wrote:

A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as

MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))

where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209 to 0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:

geno_filtered <- geno[, which(MAF > 0.05)].

2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?

3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?


sequencing snp R gene • 240 views
ADD COMMENTlink modified 11 weeks ago by zx87544.7k • written 11 weeks ago by mab65820
  1. Why is this a Forum discussion and not a Question?
  2. Is this an assignment question?
ADD REPLYlink written 11 weeks ago by Ram16k

Thanks for calling my attention to this. You mean it is good to post my question on question forum, right? May be that is why I dont get much response

ADD REPLYlink written 11 weeks ago by mab65820

Your question was posted as a Forum type post, not a Question type post. See Biostar Forum Posting Guidelines for a primer on the types of posts.

You did get a response - how actively you follow up also determines the quality and frequency of help you'll get. Please read to better understand how open science forums work.

ADD REPLYlink written 11 weeks ago by Ram16k
gravatar for zx8754
11 weeks ago by
zx87544.7k wrote:

A simpler version of MAF function would be:

getMAF <- function(m) colMeans(m) / 2

Yes, it is possible to find out minor, we need to match frequency with counts, but doesn't always work:

# dummy genotype with 3 SNPs
geno1 <- matrix(c(rep(0, 100), rep(1, 100), rep(2, 100),
                  rep(0, 10), rep(1, 140), rep(2, 150),
                  rep(0, 150), rep(1, 140), rep(2, 10)
                  ), ncol = 3)

# get maf
# [1] 0.5000000 0.7333333 0.2666667

# get counts
lapply(data.frame(geno1), table)
# $X1
# 0   1   2 
# 100 100 100 
# $X2
# 0   1   2 
# 10 140 150 
# $X3
# 0   1   2 
# 150 140  10

From this example, SNP1 is impossible to know as maf is 50%. SNP2 minor is 0, SNP3 is MAF 0.73, not really minor (so we need to flip 1 - 0.73), but from counts we can see it is 2.

Also, instead of re-inventing wheels, try to convert your data that matches with input for existing tools, for example R HardyWeinberg package.

Or convert to plink format, etc.

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by zx87544.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1459 users visited in the last hour