Question: SNP enrichment analysis - alternatives to BROAD SNPsnap?
gravatar for epaminonda
4.4 years ago by
United Kingdom
epaminonda10 wrote:


I am trying to carry out a SNP genomic enrichment analysis and I was hoping you could help.

Basically, I have the following two sets of SNPs:

-set_A: 1,695 foreground SNPs. These are 1000g variants which, in addition, are QTL for a trait I'm interested in. They all are within ChIP-seq peak intervals for a TF.

-set_B: 116,000 background SNPs. These are a superset of set_A and all are within ChIP-seq peaks for the same TF above. These represent all the SNPs I had tested for the QTL property above.

I want to determine whether  set_A is enriched in some particular annotation compared to set_B. In other words, I want to know whether, compared to all SNPs tested for QTL in my ChIP-seq peaks, my set_A is enriched in some annotation. For example, this annotation might be strong LD intervals around GWAS genome wide significant SNPs from the GWAS catalog. Therefore I want to ask:

"Are my set_A variants more likely to be in GWAS LD blocks for some disease/trait compared to the background set of SNPs?"

I have ascertained already that set_A are MAF matched to set_B (bootstrapped KS test of the two MAF distributions), so this should not be a problem. I ran the GAT simulation-based enrichment tool:

which works fine and has returned enrichment results. However, I believe my foreground and background sets need more pre-processing: there is LD structure both within set_A and within set_B. So some SNPs in A are in LD across them and some SNPs in B are in LD across them. I believe I need to correct for this, too, to avoid inflation of enrichment. I would probably need to LD-match set_A and set_B, or maybe pool or subsample independent SNPs only from set_A and set_B. The GAT, which is designed to compute simple interval enrichments, cannot do this.

There is a tool which might be able to help me, by the BROAD, called SNPsnap:

Interestingly, SNPsnap should be able to carry out LD-clumping of the foreground SNPs, so it can correct for LD-derived inflation of enrichments. However, SNPsnap only returns a frequency matched background of (at most) 20.000 snps: I don't need this, because I believe I already have the most suitable background set (set_B) (and in any case I need my background snps to be in the ChIP-seq peaks).

Additionally, it seems SNPsnap is quite experimental (I have had about 80% of runs fail on me) and any mails to the authors go unanswered. So I believe the program is not really supported.

Therefore I was hoping anyone on here had ideas on how to do this:

LD clumping: what if I mapped my set_A snps to strong LD intervals and computed, instead of the enrichment of set_A snps in GWAS LD blocks, the enrichment of set_A LD blocks in GWAS LD blocks?
 Else, for each LD block containing more than 1 set_A SNP, I could select the "best" according to some metric? Any other ideas or suitable tools?

Thanks for any suggestions you might be willing to share.

snp chip-seq • 2.2k views
ADD COMMENTlink written 4.4 years ago by epaminonda10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour