Filter out genetic marker with Minor Allele Frequency
1
1
Entering edit mode
3.1 years ago
mab658 ▴ 20

A genotype data for GWAS is provided as 0, 1, and 2 format where individuals are rows and each column is a SNP. The minor allele frequency (MAF) is calculated as

MAF <- apply(geno, 2, function(x) sum(x) / (length(x) *2))


where the geno is a matrix object for the genotype data. The calculated MAF ranged from 0.05087209 to 0.94912791. My threshold is to remove SNP with MAF of 5%. Am I right with this R script:

geno_filtered <- geno[, which(MAF > 0.05)].


2) Is it possible to know which of the coded SNP 0,1, and 2 is minor based on above calculation?

3) How do I test if any of the SNP is in Hardy-Weinberg Equilibrium (HWE)? If any of them is not in HWE, is it advisable to filter it out as well?

Thanks

SNP sequencing R gene • 2.4k views
0
Entering edit mode
1. Why is this a Forum discussion and not a Question?
2. Is this an assignment question?
0
Entering edit mode

Thanks for calling my attention to this. You mean it is good to post my question on question forum, right? May be that is why I dont get much response

0
Entering edit mode

Your question was posted as a Forum type post, not a Question type post. See Biostar Forum Posting Guidelines for a primer on the types of posts.

You did get a response - how actively you follow up also determines the quality and frequency of help you'll get. Please read http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202 to better understand how open science forums work.

1
Entering edit mode
3.1 years ago
zx8754 10k

A simpler version of MAF function would be:

getMAF <- function(m) colMeans(m) / 2


Yes, it is possible to find out minor, we need to match frequency with counts, but doesn't always work:

# dummy genotype with 3 SNPs
geno1 <- matrix(c(rep(0, 100), rep(1, 100), rep(2, 100),
rep(0, 10), rep(1, 140), rep(2, 150),
rep(0, 150), rep(1, 140), rep(2, 10)
), ncol = 3)

# get maf
getMAF(geno1)
# [1] 0.5000000 0.7333333 0.2666667

# get counts
lapply(data.frame(geno1), table)
# $X1 # # 0 1 2 # 100 100 100 # #$X2
#
# 0   1   2
# 10 140 150
#
# \$X3
#
# 0   1   2
# 150 140  10


From this example, SNP1 is impossible to know as maf is 50%. SNP2 minor is 0, SNP3 is MAF 0.73, not really minor (so we need to flip 1 - 0.73), but from counts we can see it is 2.

Also, instead of re-inventing wheels, try to convert your data that matches with input for existing tools, for example R HardyWeinberg package.

Or convert to plink format, etc.

Traffic: 2369 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.