My data looks as follows:
1 rs987435 0 2 1 2 2 1 1
2 rs345783 0 0 0 0 0 0 0
I removed the snps with MAF < 5% using the following code
data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] , is this correct ??
now I want to test for HWE
The data for the HWE exact should be as follows
0 1 2
rs987435 #of zero #of 1 #of 2
and so on for the rest of snps, I would like to have a code in R to transform the data as I mentioned can any one help
so what is the correct code for keeping the snps with MAF > 5% ?
What do you think it is?
I'm not sure what did you mean by adding the heterozygous counts? do you mean replace the rowSum(datasnp==2) with 0.5*rowSums(datasnp==1)
My presumption is that
0indicates homozygous for the reference,1heterozygous, and2homozygous for the alternate allele. So assuming the reference is the major allele, then the MAF is the number of2s plus have the number of1s divided by the number of columns.Thank you, my question is after I calculate the MAF the correct way , I will remove the row (SNP) according to the code below, right??
data <- datasnp[rowSums(datasnp==2)+0.5*rowSums(datasnp==1) /ncol(datasnp) > 0.05, ]
Note the extra set of parentheses.
Thank you for answer, I need to understand why the MAF is calculate this way could you please send me a link that explains why
Thank you again
I guess you could just google around for "allele frequency". But frankly this is simply the definition.