I am thinking of designing an algorithm to rank variants in order of most to least likely to have a functional effect in a autoimmune disease.
The problem is, if I wanted to train such an algorithm, it would be helpful to have a list of variants that are causative, or almost certainly causative, of an autoimmune disease.
For instance, in the Rheumatoid Arthritis literature, there are a lot of associated SNPs, but we are reasonably sure that amino acid positions 71 and 74 of HLA-DRB1 actually CAUSE the disease, rather than being an associated SNP in LD with a SNP that actually increases or decreases risk of the disease.
It can be any autoimmune disease, but mendelian diseases etc will not help. The goal will be to mine characteristics of these causative SNPs.
I am aware of algorithms like CADD, PICS, etc., I am more in need of a list of validated causal snps than anything else.
Does anyone know of such a source?
-------------ADDENDUM - someone posted below the NHGRI catalog-------------------
The problem is that the NHGRI catalog lists associations, not (necessarily) causal variants.
To determine a causal variant, we need follow up experiments in the wetlab to show necessity and sufficiency. It is thought that the lead SNP is actually causal in 10% or fewer of cases (Farh et al 2015).
I am looking for a resource only for the latter type of SNP.