PCA results from PLINK and Hail vastly different
1
0
Entering edit mode
8 months ago
Ahmed • 0

I am getting completely different results when I conduct PCA on PLINK and on HAIL - does anyone know why? When I say the results are different I mean:

  1. Comparing the pearson correlation between the top 10 PC's there is 0 correlation
  2. When I create a PCA scatter plot I get completely different looking clusters suggesting different population stratification

Points to note:

  1. Its the same set of samples and SNPs (I am using the same .bed/.bim/fam files)
  2. I did QC on the dataset prior (including LD pruning, MAF > 0.05, genotype > 0.95). From the hail info none of the SNPs are being removed (it says the number of SNPS left after filtering is the same as I had in my .bim file)
  3. When I use another software (bigsnpr) I get clusters close to what I get in Hail.

My commands are as follows:

HAIL v0.2

hl.import_plink(bed =file.bed, bim =file.bim,  fam =file.fam, reference_genome='GRCh38' ).write("file.mt', overwrite = True)
samples = hl.read_matrix_table('file.mt')
pca_evals_s, pca_scores_s, pca_loadings_s = hl.hwe_normalized_pca(samples.GT, k=10, compute_loadings=True)

PLINK2.0

plink2.0 --bfile file --pca 10 --out plink_pca --threads 14

EDIT

The issue only happens with plink2.0 and not with plink1.9

Thank you!

gwas plink hail pca • 653 views
ADD COMMENT
0
Entering edit mode

If you run plink2 --version what is the result?

ADD REPLY
1
Entering edit mode
8 months ago

As noted in the plink2-users Google group (https://groups.google.com/g/plink2-users/c/DeTVfXAjzTY ), this was due to use of a 6-year-old plink2 build.

ADD COMMENT

Login before adding your answer.

Traffic: 861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6