Question: Find The Extent Of A Peak In A Manhattan Plot
gravatar for brentp
9.8 years ago by
Salt Lake City, UT
brentp23k wrote:

We have some data that's similar to typical GWAS data where we have a p-value for association with a given trait at millions of SNPs across a genome. When viewing these as a manhattan plot, we see peaks.

My task is to find the extent of the peaks, not just the highest point--since significant peaks tend to occur near each other. In the past, I've used some ad-hoc windowing method, but my question is: are there standard methods (and implementations) in GWAS for this type of thing?

Note: After reading these biostar threads, I'm aware of the ChIP-seq tools but I'm looking for something more targeted to regions rather than single nucleotides.

The difficulty in the problem is that significant SNP's are in the same region, but may be separated by non-signficiant SNP's.

gwas peak-calling snp • 5.2k views
ADD COMMENTlink modified 9.8 years ago by Genotepes950 • written 9.8 years ago by brentp23k

With most GWAS result sets, you can count the genome-wide significant SNPs on one hand. Presumably your results are more complex. Can you provide details about the rough shape of the plot: many peaks or few, focal or broad, close enough that you can't guess whether one peak is really two, etc? What are the factors that make this hard?

ADD REPLYlink written 9.8 years ago by David Quigley11k

I have added a bit of clarification. As of now, I have a script that seems to do this ok. It takes a seed parameter and extends out from peaks allowing to skip a certain number of lower values. I'm still interested to hear more traditional approaches.

ADD REPLYlink written 9.8 years ago by brentp23k
gravatar for Genotepes
9.8 years ago by
Nantes (France)
Genotepes950 wrote:

I am not sure how a peak can be defined easily just with SNP p-value result. If you are looking for a method that could give a correspndance pb-position/probability of location of the functional sNP, I'd advise running some hapltype clustering test that will translate the fuzzy peak shape into a linkage like plot (HapCluster from Balding - extensively coded by Mailund, or ancesHC by Coin). Gives you a credibility interval.

But you need first to ensure that your significant SNPs scattered throuhg the region represent one single association signal - maybe using nested likelihood tests like Thesias. Finally, some people use imputation. However, this is not very different in essence from hapl clustering and make the assumption that all the SNPs are present in the panel.

Finally, you can list all you "significant" SNPs - need to choose a p-value threshold- , list the blocks they are lying in (block limits are to be found on IMPUTE software page) and take the union of these blocks as the area of the peak (because the functional SNP is post likely in high LD with your observed significant SNPs and can only be found within the same block). Beware that all this is under the assumption that your association is driven by a common variant.

If your goal is not fine-mapping but providing an additional evidence for true signal based on the "length" of a peak (something similar to what was proposed for linkage in a 90s Terwilliger paper), then haplotype clustering is not bad. However, I am not sure here the length of a peak is really reflecting the true positiveness of an association as it is highly dependent on the recombination pattern of the region - a peak in a short recombination block can show very significant p-values (and be a true positive) but association (hence peak length) would fade very quickly once the recombinaiton hot-spots are met.

Christian Dina

ADD COMMENTlink modified 9.0 years ago • written 9.8 years ago by Genotepes950

@Christian, thanks, these are helpful. Both HapCluster and IMPUTE seem close to what I was looking for.

ADD REPLYlink written 9.8 years ago by brentp23k
gravatar for Hanif Khalak
9.8 years ago by
Hanif Khalak1.2k
Doha, QA
Hanif Khalak1.2k wrote:

There are standard peak ID and measurement techniques, e.g. using Matlab, which essentially boil down to:

  1. clip region interval from genome
  2. smooth the -log10(pval) signal
  3. find zero-crossing(s) in 1st derivative of (2) -> these are the tips of the peaks

Additionally, you can find the boundaries of the peak(s) by determining the zero-crossings in the 2nd derivative of (2) which is also the 1st derivative of (3). I imagine this should work in Octave as well without any toolboxes.

Note of caution: you may be trying to identify a "region of association" this way, but LD will confound any interpretation of this peak. You should still probably perform haplotype and LD analysis as already suggested.

ADD COMMENTlink written 9.8 years ago by Hanif Khalak1.2k

Thanks, I will have a closer look at the Matlab code. Using the derivatives seems like a reasonable way to go.

ADD REPLYlink written 9.8 years ago by brentp23k

These are good standard techniques but I am not sure whethere one would have an interpretation that could make sense in the context of association - although I might be wrong. Probably, in association context , the haplotypes will be the thing to do. Again, this also depends on what you exactly want out of your smoothing.


ADD REPLYlink written 9.8 years ago by Genotepes950

@genotepes: our comments are in LD :-)

ADD REPLYlink written 9.8 years ago by Hanif Khalak1.2k

@hanif definitely high r2 cheers

ADD REPLYlink written 9.8 years ago by Genotepes950
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1710 users visited in the last hour