Question: Pruning GWAS summary statistics for LD
1
gravatar for ika
7 weeks ago by
ika30
ika30 wrote:

I have summary statistics of a GWAS in a tab-separated format, as follows:

SNP CHR BP  GENPOS  ALLELE1 ALLELE0 A1FREQ  F_MISS  BETA    SE  P

I want to adjust these summary statistics for LD (preferentially by pruning), keeping only one of the SNPs in LD. I don't want to threshold for p-value (in fact, the pruned SNPs should stay representative in terms of p-values, as I'm trying to do a sort of enrichment analysis and need non-significant and significant results alike). I also don't have access to the genotype data - only to these summary statistics. They also contain some X-chromosomal SNPs.

I'm not sure which tool is suitable for this. I've considered the following tools:

  • Plink

    As far as I know, one can perform LD pruning in Plink. However, I can't seem to find a way to perform this pruning in Plink with this file format.

  • GCTA-Cojo

    From reading, this seems to do exactly what I want it to - however, the documentation states that "it will be extremely time-consuming if you set a very low significance level, e.g. 5e-3" . I'm guessing this might not run to completion if I were to try this with no p-value threshold at all.

  • LDpred

    The documentation states "LDpred is a Python based software package that adjusts GWAS summary statistics for the effects of linkage disequilibrium". This does sound like what I need, though I'm not sure if it really can perform this step of LD pruning in isolation. I wanted to try it, though I've not gotten it to work on my system.

Any help is greatly appreciated. Is one of these tools suitable or is there another that can be used for this task? Is it possible to wrangle these summary statistics into a format suitable for Plink? Is GCTA-Cojo feasible without a p-value threshold? Is LDpred capable of this and would be worth spending time to set up?

cojo ld pruning ldpred plink • 161 views
ADD COMMENTlink modified 7 weeks ago by Sam3.0k • written 7 weeks ago by ika30
2
gravatar for Sam
7 weeks ago by
Sam3.0k
New York
Sam3.0k wrote:

It seems like you want to do clumping (prunning the summary statistics, but keeping the most significant SNP)

You will always need a reference panel

plink  --clump <sumstat> --clump-p1 <max p-value to retain> --clump-p2 1 --clump-r2 <r2 threshold> --clump-kb <window size> --bfile <reference> --out <output prefix>
ADD COMMENTlink written 7 weeks ago by Sam3.0k

Thank you for your answer. I actually don't want to keep the most significant SNP. Ideally, I want to adjust only for LD, without selecting or filtering based on p-value. Essentially, I need the p-value distribution to stay representative.

Or is this not possible?

ADD REPLYlink written 7 weeks ago by ika30
2

That's possible to do pruning, but that isn't exactly random either. You can do that by first making a list of SNPs in your summary statistic file, then do

plink --bfile <ld-reference> --indep-pairwise 200 50 0.25 --out <prefix>

This will ask plink to perform pruning with a window size of 200kb, sliding across the genome with step size of 50 variants at a time, and filter out any SNPs with LD r2 higher than 0.25

ADD REPLYlink written 7 weeks ago by Sam3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour