I performed a genome-wide meta-analysis based on summary statistics from the four cohorts to identify significant loci. Next, I would like to perform a conditional analysis using GCTA-COJO to search for SNPs independent of significant lead SNPs. I know that GCTA requires a reference panel for LD estimation in bed, bim, and fam files.

I have individual direct genotype data (bed, bim, fam) and imputed dosage genotype data (vcf) imputed with 1000 genomes for one of the four cohorts used in the meta-analysis.

Which of the following should be used as the reference panel for LD estimation in GCTA-COJO? (1) The direct genotype data obtained from the SNP array of the above cohort. (2) The imputed genotypes of the above cohorts converted to bed, bim and fam files (3) Data of 1000 genomes for corresponding races

Since (1) has a small density of SNPs and (3) has a small number of samples, I think it would be appropriate to use (2). I would be grateful if you could help me.

