I have a question, actually a two part question.
I have whole genome data on 120 cases and 20 controls. They are all unrelated individuals. I have a LoF variant that appears 5 times in cases and 0 in controls. This variant is also absent from all population databases (ExAC, gnomAD, 1000GP etc.). In fact, the gene has just 1 LoF variant in ExAC and the pLI is 0.97 indicating LoF intolerant. The functional genetic evidence also indicates that this gene is related to the phenotype that I'm studying.
I need to actually associate this variant/gene with the phenotype but given my small number of controls this is a problem. I was simply going to use Fisher's exact test. My question is:
What test should I use? How might the ExAC or gnomAD data be used in a case/ control analysis? For example, I was going to obtain all the European samples (my samples are mostly of European ancestry) and use all the high quality calls at this site as part of my control. I downloaded the gnomAD whole genome VCFs (they don't include the genotype field). These were created using a similar variant calling, annotation, and filtering pipeline to my own. In fact I followed these practices as mich as I could when processing my own data.
Anyways, my idea was simply to use the European samples from ExAC or gnomAD with high quality calls at the site where my variant is located. So if there's 6000 of such calls then I can use this figure in Fisher Exact Test along with the 20 controls.
Does anyone have anything to suggest about this? I mean with 20 controls you're pretty much screwed I don't what else you can do.