I have summary statistics of a GWAS in a tab-separated format, as follows:
SNP CHR BP GENPOS ALLELE1 ALLELE0 A1FREQ F_MISS BETA SE P
I want to adjust these summary statistics for LD (preferentially by pruning), keeping only one of the SNPs in LD. I don't want to threshold for p-value (in fact, the pruned SNPs should stay representative in terms of p-values, as I'm trying to do a sort of enrichment analysis and need non-significant and significant results alike). I also don't have access to the genotype data - only to these summary statistics. They also contain some X-chromosomal SNPs.
I'm not sure which tool is suitable for this. I've considered the following tools:
As far as I know, one can perform LD pruning in Plink. However, I can't seem to find a way to perform this pruning in Plink with this file format.
From reading, this seems to do exactly what I want it to - however, the documentation states that "it will be extremely time-consuming if you set a very low significance level, e.g. 5e-3" . I'm guessing this might not run to completion if I were to try this with no p-value threshold at all.
The documentation states "LDpred is a Python based software package that adjusts GWAS summary statistics for the effects of linkage disequilibrium". This does sound like what I need, though I'm not sure if it really can perform this step of LD pruning in isolation. I wanted to try it, though I've not gotten it to work on my system.
Any help is greatly appreciated. Is one of these tools suitable or is there another that can be used for this task? Is it possible to wrangle these summary statistics into a format suitable for Plink? Is GCTA-Cojo feasible without a p-value threshold? Is LDpred capable of this and would be worth spending time to set up?