LD pruning and other QC before association analysis ?
2
3
Entering edit mode
8.0 years ago
Picasa ▴ 640

Hello,

Usually we perform QC stuffs such as removing SNP with a low degree of MAF, LD pruning etc. before a PCA.

But should we have to do these QC filtering before an association analysis ? By association analysis, I mean classical test such as case/control association with SNP etc. as explained here:

http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml

gwas qc • 11k views
ADD COMMENT
5
Entering edit mode
7.9 years ago
Shab86 ▴ 310

The answer is yes and more ! I am putting up links for two tutorial papers for QC before GWAS. Hope it helps. Refs:

  1. http://www.nature.com/nprot/journal/v5/n9/full/nprot.2010.116.html
  2. http://onlinelibrary.wiley.com/wol1/doi/10.1002/sim.6605/full
ADD COMMENT
0
Entering edit mode

Here is the link to slides that better translated the nature protocol: http://www.bioinf.wits.ac.za/courses/gwas/Qc_combined_final.pdf

ADD REPLY
5
Entering edit mode
7.9 years ago
Nick ▴ 70

These steps are necessary before PCA in order to identify the principal dimensions of genetic variation between samples, without over-weighting the contribution of groups of correlated SNPs.

PCA is just one of the QC steps you should perform to prepare data for case/control association testing. It may be used to establish whether samples are of common ancestry and you might want to exclude outlier samples. Other QC steps would include removing SNPs with low genotype calling score (e.g. GenTrain score and cluster separation score in GenomeStudio); removing SNPs and samples with low call rate; removing SNPs which fail the HWE test; checking inferred gender vs recorded gender; removing one of each pair of related samples (for unrelated case-control design); removing outlier samples of heterozygosity/inbreeding test. See for example this protocol for exome chip QC.

Assuming you have performed these QC steps and are left with a clean dataset, you should perform case-control analysis without LD pruning. You can also include SNPs with low MAF but your analysis may have low power to detect significant rare SNPs (afterwards you can perform a QQ test to check that the assumptions of your statistical association test are satisified). It is important to check cluster plots for any significant SNPs you find (genotypes can be particularly difficult to call correctly for rare SNPs).

ADD COMMENT

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6