How to calculate maf for a variant for case and control
0
0
Entering edit mode
3.3 years ago
MAPK ★ 2.1k

I have a number of individuals for project A to L carrying one particular variant (rsXXX). There are homozygous reference and heterozygous only--no homozygous alternate for this variant in my cohorts. I would like to calculate the MAF for cases and control for the cohorts (and also for cases and controls), but I am not very familiar with the calculation methods. Could someone please help me calculate this.

    cohort <-   structure(list(Project = c("A", "B", "C", "D",
"E", "F", "G", "H", "I", "J", "K",
"L"), Homo_Ref_Total_Individuals = c(836L, 1666L, 209L, 16L, 929L, 841L, 252L,
1493L, 568L, 44L, 190L, 2L), Homo_Ref_CASES = c(527, 993, 0, 0, 471,
226, 201, 1036, 0, 0, 0, 0), Homo_Ref_CONTROLS = c(191, 671, 209, 0,
295, 615, 17, 326, 161, 0, 94, 0), Hetero_Total_Individuals = c(5, 10, 2, 0, 12,
8, 6, 23, 1, 0, 1, 0), Hetero_CASES = c(2, 6, 0, 0, 5, 1, 4, 21, 0,
0, 0, 0), Hetero_CONTROLS = c(3, 4, 2, 0, 5, 7, 0, 2, 1, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-12L))

MAF genetics variant • 1.1k views
0
Entering edit mode

You know you could print(cohort) and paste that as a table instead of using dput - easier to eyeball that way. It would look like this:

>print(cohort)

Project Hom_Ref_Total_Individuals Hom_Ref_CASES Hom_Ref_CONTROLS Het_Total_Individuals Het_CASES Het_CONTROLS
1        A                       836           527              191                     5         2            3
2        B                      1666           993              671                    10         6            4
3        C                       209             0              209                     2         0            2
4        D                        16             0                0                     0         0            0
5        E                       929           471              295                    12         5            5
6        F                       841           226              615                     8         1            7
7        G                       252           201               17                     6         4            0
8        H                      1493          1036              326                    23        21            2
9        I                       568             0              161                     1         0            1
10       J                        44             0                0                     0         0            0
11       K                       190             0               94                     1         0            0
12       L                         2             0                0                     0         0            0

0
Entering edit mode

I think that you should first resolve why Hom_Ref_CASES + Hom_Ref_CONTROLS does not equal Hom_Ref_Total_Individuals. I mean, how can you explain the data for D (which is easier for the brain because it is all zero values)? Perhaps 'Hom_Ref_Total_Individuals' is an incorrect label.

Otherwise, once you resolve the discrepancy, the minor allele frequency (MAF) calculation is literal as per the very term, i.e., the frequency of your less frequent [minor] allele, but we usually quote this frequency per cases and controls (i.e., X% in cases; Y% in controls).

Ultimately, your data, as presented, makes no sense.

0
Entering edit mode

Hi Kevin, It is because I have (Hom_ref_CASES ==2) + (Hom_Ref_CONTROLS==1) + (unknown== -9) = Hom_Ref_Total. Same with the Het_Total. So, now the question is do I have to sum the cases (hom+Het cases) and make it my cohort or should I just use (Hom_Ref_Total+ Het_Total) as my cohort? I am really confused how to calculate the maf with this data.