Question: GWAS data from an Illumina Omni express Array and Illumina 660 W Quad Array
0
gravatar for Sheila
3.1 years ago by
Sheila290
United States
Sheila290 wrote:

Is it possible to run a gwas analysis where half of the subjects has GWAS data from an Illumina Omni express array and the other half of the subjects have GWAS data from an Illumina 660 W Quad Array?

What are the necessary steps required to include both of these data in a complete analysis - in terms of combining these groups?  

Thanks!

illumina data gwas • 2.0k views
ADD COMMENTlink modified 3.1 years ago by Vincent Laufer1.0k • written 3.1 years ago by Sheila290
4
gravatar for Vincent Laufer
3.1 years ago by
Vincent Laufer1.0k
United States
Vincent Laufer1.0k wrote:

This is a very large question with no simple answer.

Here is what you should do:

1. Google "GWAS quality control"

2. Start reading papers like this one from Stephen Turner: "Quality Control Procedures for GWAS"  http://www.ncbi.nlm.nih.gov/pubmed/21234875 

3. As you read these papers (there are a couple dozen that will help you) start to take notes on what kinds of things they recommend. For instance, you will want to do QC by variant, by sample (individual person), by batch or plate, and by chip. Take notes on each of those. 

Once you have a command of the literature, construct something like this:

I. Initial processing of new data

  1. Genotype Calling                                           (Illuminus)

  2. X an Y probe intensity, Structural Variation (Illumina Bead Studio)

  3. Coversion to bed bim fam                            (Custom, PLINK)

II.Sample QC

  1. Sex Check                                                       (PLINK)

  2. Missingness Outliers                                     (PLINK)

  3. Heterozygosity Rate Outliers                       (PLINK)

  4. i.Calculate observed heterozygosity per individual
  5. Plot Missingness on  X axis, Heterozygosity on Y. Decide reasonable thresholds for exclusion

  6. Relatedness Checks

  7. i.Prune out high LD regions (e.g., HLA)
  8. ii.Prune down to 50,000 high quality, LD-independent SNPs
  9. iii.Check for IBD > 0.185, visualize      (PLINK, R (turner))
  10. iv.Mark or exclude
  11. Ancestry Checks                                (PLINK, smartPCA, R scripts)

  12. i.Extract SNPs not featured in Hapmap 3 Rel. 2 four ancestral populations                                          
  13. ii.Merge with hapmap data, flipping hapmap strand
  14. iii.PCA on merged file                          
  15. iv.Plot PC loadings       
  16. v.Determine all PCs having significant correlation to ancestry (R)
  17. vi.Exclude ancestry outliers                             (R)
  18. Per Chip comparisons on a.-d.                                 (Custom)

  19. Exclude or mark all sample outliers

III.Marker QC

  1. Excessive Missingness                      (PLINK)

  2. i.Select threshold based on visual inspection of histogram
  3. HWE                                                   (PLINK)

  4. i.If a higher threshold is chosen, manually inspect cluster plot
  5. Differential Missingness Check       (PLINK)

  6. i.Informative Missingness – CNV
  7. ii.Consecutive Missingness in a stretch
  8. Low MAF                                            (PLINK)

  9. Internal Sample Reproducibility (Between Chips)                      (PLINK)

  10. External Sample Reproducibility (HapMap Concordance)          (PLINK)

  11. Per Chip Call Rate, AF, GF, comparisons on a.-d.                         (Custom)

IV.Batch Effects

  1. Average MAF                                                             (PLINK, Custom)

  2. Average call rates                                                      (PLINK, Custom)      

  3. Association Testing by plate (remove MAF <5%) (Custom, PLINK)

  4. Correction via population stratification techniques if necessary

V.Dataset Merging and Harmonization

  1. Sample Checks

  2. i.Must perform same checks as before on merged set.
  3. ii.Results should confirm previous relationships, find new related pairs.
  4. HWE – after merging, high number of SNPs out of HWE due to differences in ancestry.

  5. i.Need to stratify by ethnicity, then look for HWE outliers p < 0.0001.
  6. Population Stratification

  7. i.Use AIMs from Dumitrescu 2010
  8. Marker Checks

  9. i.After removing 95% from single study, second check for 99% overall.
  10. Batch Effects

  11. i.Test independence of AF with plate membership, and compare the distribution of chi-square statistics to the null distribution.
  12. Merging

VI.Integrated imputation, phasing, and strand flipping

  1. Genotype Harmonizer

  2. i.Across Study-Side Hapmap sample Concordance             (GH)
  3. ii.Inspect original source file designation                  (GH)
  4. iii.MAF comparisons                                                      (GH)

VII.Association Testing

  1. Post QC PCA

  2. Decide between Logistic Regression and Mixed Modelling

  3. i.Degree of Relatedness

VIII.Evaluation of QC Quality after Association Analysis

  1. Calculation of Lambda

  2. Examination of Intensity Plots

  3. Replicate SNPs of interest on a DIFFERENT Technology 

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Vincent Laufer1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1236 users visited in the last hour