Question: Multiple testing for eQTL data with correlated SNPs
0
gravatar for Krisr
5.2 years ago by
Krisr460
United States
Krisr460 wrote:

Hi,

  I have downloaded the GTEx eqtl raw p-value data for one tissue of interest.  I have about 100 candidate SNPs (some in LD with one another) for which I'd like to look for eQTL evidence.  

 I wrote a script to extract all eQTLs reported for the 100 SNPs of interest.   Using this data, I'd like to implement a FDR correction (or other method).   However, I'm concerned this approach may be too conservative in not accounting for the correlation among the 100 SNPs.  

  Does anyone have an idea of how I could address this issue?  Or if there are any programs/software out there that offers a solution?   

  I Was thinking of extracting the 1000 Genomes genotype data (~85 individuals from one of the populations of similar background)  for the 100 SNPs and using this program to adjust the p-values:  http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000456#pgen-1000456-g009

Any thoughts are appreciated. 

 

 

 

 

ADD COMMENTlink modified 5.2 years ago by David Quigley11k • written 5.2 years ago by Krisr460
1
gravatar for David Quigley
5.2 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

SNPs that are in strong LD with each other are not independent signals; they're tagging one or more causal variants and will have a complicated correlation structure. If one were starting with the raw data genotype calls, a common statistical approach would be to use a linear model to assess association between each variant and your phenotype. Pick the variant with the strongest single association. Now add that variant to the model, and ask whether there is another variant that is significantly associated with the phenotype after conditioning on the first variant you selected. If so, that's evidence that more than one SNP may be independently linked to the phenotype.

With only the P values (and, presumably, the physical location of the variants, which you can get from the annotations) you can't use this approach since you can't build the model. A practical approach would be to choose a conservative (low) cut-off for LD, and choose only one SNP within each LD block. You would still need to correct for the complete set of SNPs that you considered, since you tested them all for association. Adjusting for multiple testing using more complex methods would be tricky since you don't have the genotypes and can't use permutation. The simplest and most defensible approach would be to use Holm's correction for your 100 SNPs; it's slightly more powerful than Bonferroni but equally conservative.

ADD COMMENTlink written 5.2 years ago by David Quigley11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2262 users visited in the last hour