prepare the data for package (HardyWeinberg) to test the HWExactMat
1
0
Entering edit mode
7.0 years ago
mms140130 ▴ 60

My data looks as follows:

1 rs987435  0  2  1  2  2  1   1
2 rs345783  0  0  0  0  0  0   0


I removed the snps with MAF < 5% using the following code

data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] , is this correct ??

now I want to test for HWE

The data for the HWE exact should be as follows

               0          1            2
rs987435    #of zero     #of 1        #of 2


and so on for the rest of snps, I would like to have a code in R to transform the data as I mentioned can any one help

R snp • 1.4k views
2
Entering edit mode
7.0 years ago

Your MAF calculation is wrong, you need to add in half the heterozygous counts (i.e., 0.5*rowSums(datasnp==1), which assumes that the alternate allele is the minor allele).

Anyway, your MAF filter method should get you most of the way to getting the appropriate input for HWE exact:

cbind(rowSums(datasnp==0), rowSums(datasnp==1), rowSums(datasnp==2))

0
Entering edit mode

so what is the correct code for keeping the snps with MAF > 5% ?

0
Entering edit mode

What do you think it is?

0
Entering edit mode

I'm not sure what did you mean by adding the heterozygous counts? do you mean replace the rowSum(datasnp==2) with 0.5*rowSums(datasnp==1)

0
Entering edit mode

My presumption is that 0 indicates homozygous for the reference, 1 heterozygous, and 2 homozygous for the alternate allele. So assuming the reference is the major allele, then the MAF is the number of 2s plus have the number of 1s divided by the number of columns.

0
Entering edit mode

Thank you, my question is after I calculate the MAF the correct way , I will remove the row (SNP) according to the code below, right??

data <- datasnp[rowSums(datasnp==2)+0.5*rowSums(datasnp==1) /ncol(datasnp) > 0.05, ]

0
Entering edit mode
data <- datasnp[(rowSums(datasnp==2)+0.5*rowSums(datasnp==1)) /ncol(datasnp) > 0.05, ]


Note the extra set of parentheses.

0
Entering edit mode

Thank you for answer, I need to understand why the MAF is calculate this way could you please send me a link that explains why

Thank you again

0
Entering edit mode

I guess you could just google around for "allele frequency". But frankly this is simply the definition.