Hi,
I am running a GWAS on 482 Czech cases and 1928 British controls for a binary trait (1:4 case - control ratio) (Attached is their PCA plot, PC1 against PC2 showing clear separation).
I ran "plink2 --glm" with various included covariates checking how the lambda value and number of significant snps change. ( Attached as "gwas_summary.txt"). For one example, I am also attaching the log file when PC1 + PC2 was used, and the resulting QQ-plot.
If PC2 is included than basically I receive "UNFINISHED" error code for all 5.8 million included snps.
The first few PC-s seem important given the difference in EUR subpopulations between the cases and controls , but then I do not receive a result if they are included in the analysis.
I see that the Czech is a more homogeneous cohort compared to the diverse British - where SNP frequency difference could arise from population structure beside being affected by the disease.
What could be a solution that is computationally efficient? ( using "logistf" R package with increased iteration number seems a good method for a subset of snps but maybe not genome-wide?)
Would you recommend another GWAS tool to use?
Best wishes, Anita
PLINK v2.0.0-a.6.12LM AVX2 AMD (20 Apr 2025)
Options in effect:
--ci 0.95
--covar /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/czech_age_sex_10_PC.txt
--covar-name PC1 PC2
--extract-if-info R2 >= 0.8
--glm hide-covar firth-fallback
--keep /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/matched_controls_czech_cases.txt
--out /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/output/czech_pca_czech_cohort_matched_controls/pc1.2/output/gwas_results
--pfile /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/main_cohort
--pheno /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/phenotypes.txt
--pheno-name PHENO
--remove /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/0.samples_info/remove_samples.txt
--seed 1
--threads 1
Hostname: overdrive
Working directory: /media/pontikos_nas2/AnitaSzabo/phd_projects/overdrive_scripts/temp_phd/gwas_2024/genotype_to_gwas_scripts
Start time: Sun May 25 20:51:36 2025
257622 MiB RAM detected, ~218229 available; reserving 128811 MiB for main
workspace.
Using 1 compute thread.
16416 samples (0 females, 0 males, 16416 ambiguous; 16416 founders) loaded from
/media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/main_cohort.psam.
5800849 out of 5818824 variants loaded from
/media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/main_cohort.pvar.
1 binary phenotype loaded (482 cases, 15934 controls).
--keep: 2410 samples remaining.
--remove: 2410 samples remaining.
2 covariates loaded from /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/QC/Czech/czech_age_sex_10_PC.txt.
2410 samples (0 females, 0 males, 2410 ambiguous; 2410 founders) remaining
after main filters.
482 cases and 1928 controls remaining after main filters.
Calculating allele frequencies... done.
5800849 variants remaining after main filters.
--glm logistic-Firth hybrid regression on phenotype 'PHENO': done.
Results written to /media/pontikos_nas2/AnitaSzabo/phd_projects/gwas_2024/analysis/5.gwas/pipeline_output/without_high_LD_regionin_PCA/output/czech_pca_czech_cohort_matched_controls/pc1.2/output/gwas_results.PHENO.glm.logistic.hybrid .
End time: Sun May 25 21:52:23 2025