Question: gene set enrichment in an exome cohort study
1
gravatar for Quak
4.9 years ago by
Quak300
United States
Quak300 wrote:

Basically, I have to start by saying that I don't know much about statistical genetics; however I understand statistics. 

There is a cohort of 400 patients with a disease. A set of genes are hypothesized that are causative or have significantly more variants than usual.

Based on my understanding it is necessary to have a case cohort and showing that this set of genes are only enriched in the case set but not in the control set.

The NHLBI cohort contains almost 4000 individuals that are ethnically matched with the disease cohort. 

I saw in this forum that people have mentioned of using "burden-test" or say "fisher-test". To my understanding, these methods, are comparing the frequency of variations in the population of the cohort. 

say,

variant_name 1KG NHLBI Disease
snp1_from_gene1 0.03 0.01 0.1
snp2_from_gene1 0.7 0.02 0.2
snp3_from_gene2 0.3 0.01 0.1

 and then, we compare the distribution of frequency in these between NHLBI vs Disease or 1KG vs Disease, to prove that these two distributions are not the same with a certain p-value. 1) is this correct ? can you make more explanation in this part ? is this why burden_test is ? (I don't think so)

2) as the second question, if I have 3 sets of control, say, NHLBI, 1KG and another disease cohort say (Autism) these 3 sets don't necessarily agree with each other, and possible, variant in NHLBI can be statistically significant compare to the 1KG. In the above example, one can compare 1KG vs NHLBI and see that there are significantly different distributions. One obvious reason, is different variant calling methods. So, I wonder, what is the best strategy to have such comparison ?

snp gene genome • 1.8k views
ADD COMMENTlink modified 4.9 years ago by Katie D'Aco1000 • written 4.9 years ago by Quak300
3
gravatar for Katie D'Aco
4.9 years ago by
Katie D'Aco1000
Massachusetts
Katie D'Aco1000 wrote:

Burden tests generally aggregate variants in a genomic feature (usually genes), and do statistical analysis by gene, instead of by snp.  The introduction to the SKAT paper has a nice description of burden tests.

You might want to open a new thread for your second question, since it is really a different topic.

ADD COMMENTlink written 4.9 years ago by Katie D'Aco1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1498 users visited in the last hour