Question: How Much Of Genome Is Captured By A Gwas?
7
gravatar for K_Star
6.0 years ago by
K_Star110
K_Star110 wrote:

How much of the genome is 'captured' in a GWAS with 300k, 500k or 1,000k SNPs? And where are most of the tagging SNPs located? Are they mostly in the exome?

gwas • 2.5k views
ADD COMMENTlink written 6.0 years ago by K_Star110
1

What genome ????

ADD REPLYlink written 6.0 years ago by Pasta1.3k

Yes, an important point as GWAS are conducted in human and non-human species. Plant GWAS are really cool for example.

ADD REPLYlink written 6.0 years ago by Larry_Parnell15k

Sorry, Human genome.

ADD REPLYlink written 6.0 years ago by K_Star110
8
gravatar for Khader Shameer
6.0 years ago by
Manhattan, NY
Khader Shameer17k wrote:

The experimental basis of GWAS is genotyping. SNP genotyping enables rapid scanning of .3M, 0.5M or 1M genetic markers (or SNPs) to find genetic variations associated with complex diseases or traits. GWAS deals with a large number of markers and large number of subjects to get reliable signal and associations should be of high significance. For a detailed overview of recent advances in GWAS refer to another discussion here.

How much of the genome is 'captured' in a GWAS with 300k, 500k or 1,000k SNPs?

Human genome encodes 1 SNP/100-300bp; ~3GB sequence ~10million SNPs. It is impossible to analyze such a large number of data due to several limiting factors. To deal with this issue we can use Linkage Disequilibrium (LD) mapping (See section on D', recombination rate), Haplotype, Haplotype blocks and Haplotype Tag SNPs (tagSNPs). (Read about HapMap project here). Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block. This is a representative SNP in a given region of genome with high LD. This will enable to find genetic variation without genotyping all the 10M SNPs. Previous studies indicated that genotyping chips with .5M-1M SNPs will be sufficient for a good GWAS.

And where are most of the tagging SNPs located?

Basic assumption here is the genotyped SNPs must cover all LDs. You can get further details on Illumina Human 660W-Quadv1_A or Affymetrix 500K Gene Chip

Are they mostly in the exome?

No. TaggingSNP selection is not biased towards exome. Most of GWAS hits are in intergenic / promoter or distal regions from exons.

ADD COMMENTlink written 6.0 years ago by Khader Shameer17k

Thank you Larry, and in particular, Khader for the informative responses.

The answer that I am looking for then is, how many of the estimated 10 millions SNPs are captured using each of the aforementioned SNP arrays, say for example in a CEU cohort.

ADD REPLYlink written 6.0 years ago by K_Star110
5
gravatar for Larry_Parnell
6.0 years ago by
Larry_Parnell15k
Boston, MA USA
Larry_Parnell15k wrote:

Excellent overview provided by Khader. I'll add a couple points:

Different platforms capture LD SNPs better than others. Illumina is better in this regard, but the new version of from Affy makes up for this deficiency. Size, too, matters - more SNPs give better LD coverage.

Population differences. Some populations will not be as well interrogated by available arrays as other populations. This is so because many polymorphic sites in one population may not be variable in another population, or at so low frequency as not to be included on the array. This is not a huge problem, but can be important for some genomic regions. Think of the extreme: SNPs private to my family are not likely to be on any array because they have not been seen before.

Another way to word your question: Of all LD blocks defined by r^ = 1.0 (or 0.9 or 0.8, etc) and containing n SNPs (where n > 0, or n > 1 or...), how many of those LD blocks are represented on an array? That's a tough question and is dependent on the population under study. We do GWAS and study several different populations and have not put the effort into this calculation. To us, it is not a high priority because we use the platforms and data we have, engage in careful analysis, and report our findings. If a more complete array or analysis comes along later, so be it.

ADD COMMENTlink written 6.0 years ago by Larry_Parnell15k

Thanks for these important points Larry.

ADD REPLYlink written 6.0 years ago by Khader Shameer17k
0
gravatar for K_Star
6.0 years ago by
K_Star110
K_Star110 wrote:

Thank you Larry, and in particular, Khader for the informative responses.

The answer that I am looking for then is, how many of the estimated 10 millions SNPs are captured using each of the aforementioned SNP arrays, say for example in a CEU cohort.

ADD COMMENTlink written 6.0 years ago by K_Star110

k_star, please add this as a comment to your question / respective answer for further discussion.

ADD REPLYlink written 6.0 years ago by Khader Shameer17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour