I don't know whether there have been tools calculating LD between CNVs (detected by the NGS) and SNPs?


This question brings up an excellent result at the IRGM locus, Crohn's disease and a CNV of about 20 kbp mapping to the promoter region of this gene. Since its discovery - GWAS found SNPs linked to this CNV - it was thought that the deleted segment was causal for the disease phenotype. IRGM encodes a GTPase that functions in innate immunity.

Earlier this year, Brest, et al. published a paper showing this not to be the case. Allele-specific interactions between the miR-196 family and the mRNA for IRGM were reported by that group as causal for Crohn’s. Although the IRGM exonic SNP c.313C>T (rs10065172) is in perfect linkage disequilibrium (r2=1.0) with a 20-kbp deletion polymorphism mapping upstream of the IRGM gene and that deletion had been associated strongly with Crohn's disease in several European populations or those with European ancestry (see refs below), it is the functional consequences of this SNP that provide the details of causation. That the c.313C>T variant calls for leucine at codon 105 irrespective of allele suggested that there could be allele-specific consequences to protein expression over protein function. It was observed that predicted binding between miR-196 and IRGM mRNA was affected by the variation at SNP c.313C>T (18). Importantly, it was demonstrated that not only was the miR-196-IRGM interaction real but that expression of miR-196 was elevated in inflammatory epithelia from Crohn's sufferers

This was found without any bioinformatics tools - just old-fashioned attention to detail. It does show that knowing the LD between SNPs and CNVs is important, but at times may be irrelevant to causation.

In essence, LD between two genetic markers is a measure that can be expressed as the correlation coefficient of the alleles. A SNP is binary while a CNV is not, and so this will make the results look different. Conceptually, though, the calculation is the same. We use SAS for this, coding the CNV by number or categorically (0 copies, 1 copy, 2 copies, 3+ copies, eg). Other packages like SAS will also work.

Crohn's GWAS: 19. Parkes, M., Barrett, J. C., et al. (2007) Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat. Genet. 39, 830-832.

Wellcome Trust Case Control Consortium. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 447, 661-678.

Barrett, J. C., Hansoul, S., et al. (2008) Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat. Genet. 40, 955-962.

Franke, A., McGovern, D. P., et al. (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 1118-1125.

Thank you so much.

Hi, has anyone been able to do this yet??

