Question: Filter out genetic marker with Minor Allele Frequency
1
gravatar for mab658
5 months ago by
mab65820
mab65820 wrote:

A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as

MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))

where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209 to 0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:

geno_filtered <- geno[, which(MAF > 0.05)].

2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?

3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?

Thanks

sequencing snp R gene • 451 views
ADD COMMENTlink modified 4 months ago by zx87545.0k • written 5 months ago by mab65820
  1. Why is this a Forum discussion and not a Question?
  2. Is this an assignment question?
ADD REPLYlink written 5 months ago by RamRS17k

Thanks for calling my attention to this. You mean it is good to post my question on question forum, right? May be that is why I dont get much response

ADD REPLYlink written 4 months ago by mab65820

Your question was posted as a Forum type post, not a Question type post. See Biostar Forum Posting Guidelines for a primer on the types of posts.

You did get a response - how actively you follow up also determines the quality and frequency of help you'll get. Please read http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 to better understand how open science forums work.

ADD REPLYlink written 4 months ago by RamRS17k
1
gravatar for zx8754
4 months ago by
zx87545.0k
London
zx87545.0k wrote:

A simpler version of MAF function would be:

getMAF <- function(m) colMeans(m) / 2

Yes, it is possible to find out minor, we need to match frequency with counts, but doesn't always work:

# dummy genotype with 3 SNPs
geno1 <- matrix(c(rep(0, 100), rep(1, 100), rep(2, 100),
                  rep(0, 10), rep(1, 140), rep(2, 150),
                  rep(0, 150), rep(1, 140), rep(2, 10)
                  ), ncol = 3)

# get maf
getMAF(geno1)
# [1] 0.5000000 0.7333333 0.2666667


# get counts
lapply(data.frame(geno1), table)
# $X1
# 
# 0   1   2 
# 100 100 100 
# 
# $X2
# 
# 0   1   2 
# 10 140 150 
# 
# $X3
# 
# 0   1   2 
# 150 140  10

From this example, SNP1 is impossible to know as maf is 50%. SNP2 minor is 0, SNP3 is MAF 0.73, not really minor (so we need to flip 1 - 0.73), but from counts we can see it is 2.

Also, instead of re-inventing wheels, try to convert your data that matches with input for existing tools, for example R HardyWeinberg package.

Or convert to plink format, etc.

ADD COMMENTlink modified 4 months ago • written 4 months ago by zx87545.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1422 users visited in the last hour