Question: What is the relationship between PLINK --pca and --mds-plot in terms of % variance explained
0
gravatar for gary.napier
10 weeks ago by
gary.napier20
gary.napier20 wrote:

Hi,

I am trying to get the correct percent of variance explained by a PLINK PCA analysis (--mds-plot).

When I run the code with a parameter of 10 PCAs...

plink --vcf <vcf_file.vcf.gz> --mds-plot 10 eigendecomp --cluster --double-id --allow-extra-chr --out <out_file.pca>

...as expected I get ten columns (C1 - C10).

Then when I run the separate code to get the eigenvalues...

 plink --vcf <vcf_file.vcf.gz> --pca --double-id --allow-extra-chr --out <out_file.pca>

...the default returns a list of 20 eigenvalues.

When I re-run the code this time specifying only 10 eigen values...

 plink --vcf <vcf_file.vcf.gz> --pca 10 --double-id --allow-extra-chr --out <out_file.pca>

I get a list of 10 eigen values.

Checking the differences in the two files however, I notce that the second is just a truncated list of the first:

15.7709
15.0639
14.1574
13.3945
11.4474
11.06
11.0436
11.0377
10.9884
10.7297
10.571
10.2425
10.0003
9.82171
9.73534
9.1836
8.94517
8.93031
8.76282
8.69693

and

15.7709
15.0639
14.1574
13.3945
11.4474
11.06
11.0436
11.0377
10.9884
10.7297

Therefore, am I correct in saying that I need all of the eigenvalues (all 20) from the first eigenvalues file to make sure I have the correct % of variance explained, even though --mds-plot only outputs 10?

In other words, is the formula for the correct variance explained (in R code):

eigs <- fread(<eigenvals_file>)
var_explained <- (eigs/sum(eigs))*100

or

eigs <- fread(<eigenvals_file>)
var_explained <- (eigs[1:10]/sum(eigs[1:10]))*100

Thanks

plink mds pca eigenvalues • 128 views
ADD COMMENTlink modified 10 weeks ago by h.mon29k • written 10 weeks ago by gary.napier20
2

You need ALL eigenvalues, not just the first 20. The default is to output just 20, but there are actually as many eigenvalues as you have samples. Count your number of samples, and pass that as argument. In practice, at some point the eigenvalues get really small, but depending on your sample size these can add up quickly.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Lemire570

I see, thank you. So when I call --mds-plot 10, does this return a compression of the data into just 10 components, or is it outputting a truncated list of as many components as there are variables (as you would expect from a conventional PCA)?

In other words is --mds-plot 10 doing the analagous analysis to --pca 10, and just truncating the output, or is the analysis actually just using 10 components? Hope that is clear.

This is important because if --mds-plot 10 is not just truncating then surely the outputs won't match up. (The --mds-plot analysis has only used 10 components, so the eigenvals will be different to those used in --pca 10). Is that right?

ADD REPLYlink written 10 weeks ago by gary.napier20
1

MDS, just like PCA, has as many dimensions as there are variables. I don't know the specifics of the implementation, but my guess is if you specify 10, it will likely just give you the first 10 dimensions. This means that if you ask for 10, and then 20 in a second run, the first 10 should be the same in both runs.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Lemire570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1787 users visited in the last hour