Filter out genetic marker with Minor Allele Frequency
1
1
Entering edit mode
6.0 years ago
mab658 ▴ 20

A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as

MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))

where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209 to 0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:

geno_filtered <- geno[, which(MAF > 0.05)].

2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?

3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?

Thanks

SNP sequencing R gene • 4.2k views
ADD COMMENT
0
Entering edit mode
  1. Why is this a Forum discussion and not a Question?
  2. Is this an assignment question?
ADD REPLY
0
Entering edit mode

Thanks for calling my attention to this. You mean it is good to post my question on question forum, right? May be that is why I dont get much response

ADD REPLY
0
Entering edit mode

Your question was posted as a Forum type post, not a Question type post. See Biostar Forum Posting Guidelines for a primer on the types of posts.

You did get a response - how actively you follow up also determines the quality and frequency of help you'll get. Please read http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 to better understand how open science forums work.

ADD REPLY
1
Entering edit mode
6.0 years ago
zx8754 11k

A simpler version of MAF function would be:

getMAF <- function(m) colMeans(m) / 2

Yes, it is possible to find out minor, we need to match frequency with counts, but doesn't always work:

# dummy genotype with 3 SNPs
geno1 <- matrix(c(rep(0, 100), rep(1, 100), rep(2, 100),
                  rep(0, 10), rep(1, 140), rep(2, 150),
                  rep(0, 150), rep(1, 140), rep(2, 10)
                  ), ncol = 3)

# get maf
getMAF(geno1)
# [1] 0.5000000 0.7333333 0.2666667


# get counts
lapply(data.frame(geno1), table)
# $X1
# 
# 0   1   2 
# 100 100 100 
# 
# $X2
# 
# 0   1   2 
# 10 140 150 
# 
# $X3
# 
# 0   1   2 
# 150 140  10

From this example, SNP1 is impossible to know as maf is 50%. SNP2 minor is 0, SNP3 is MAF 0.73, not really minor (so we need to flip 1 - 0.73), but from counts we can see it is 2.

Also, instead of re-inventing wheels, try to convert your data that matches with input for existing tools, for example R HardyWeinberg package.

Or convert to plink format, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6