I'm new to genomics and I'm trying to get my head around the findings of typical pathogenic studies reported in clinvar and those reported in other snp databases such as GWAS. And how to relate them. I'm interested in understanding the general case - I imagine there are exception in both.
Clinvar. I feel that the majority of the reported studies are of the form "we look at 50 people with rare condition X and found a high prevalence of rare variation Y, therefore variation Y is very likely to be pathogenic". Odds ratios are rarely reported, I imagine because the variation is so rare that it would be impossible to find enough controls to calculate this to any sort of statistical significance.
GWAS. The majority of tests are of the form "we looked at the genomes of 2000 people for any genomic variations that could explain prevalence of condition Y. We found that people with variation X had a 1.3x greater chance of developing condition Y than those without variation X.". Most of these findings report variations that are relatively common, presumably because this leads to statistically significant results given the sample size.
So some premises it would be great to get your thoughts on:
Is is fair to say that variants reported as pathogenic in clinvar are typically high-prevalence (ie variation in almost all cases leads to disease) and those in gWAS studies are low prevalence (eg. environmental factors or multigenic factors at play)?
Hypothetically, with a big enough population that includes enough rare-disease suffering people, a top-down GWAS studies would be able to confirm the typical bottom-up Pathogenic variation reported in clinvar. So therefore, clinvar studies and GWAS studies sit on the same spectrum we just don't have enough information to compute the presumably very high OR for clinvar studies of rare diseases.