pca in eigenstrat
1
0
Entering edit mode
4.7 years ago
genogeno • 0

Hi!

I did LD pruning in PLINK. As a result, I have .bed,.bim,.fam,.log and .nosex files. Now, I am trying to do PCA in eigenstrat. As I understand, I need a parameter file to run the program. I guess I will just write a name for output file but How can I obtain input files?

genotypename: ???

snpname: ???

indivname: ???

outputformat: ???

genotypeoutname:

snpoutname:

indivoutname:

SNP genome • 7.1k views
0
Entering edit mode

I coverted bim/fam/bed files to ped/map files using --recode.

I created a txt file with the following contents. (parfile.txt)

\$ less parfile.txt

   genotypename:selectedsnplist.ped
snpname: selectedsnplist.map
indivname:   selectedsnplist.ped
outputformat: EIGENSTRAT
genotypeoutname: selectedsnplist.geno
snpoutname: selectedsnplist.snp
indivoutname:selectedsnplist.ind


I am trying to run convertf -p /data/myname/parfile.txt but it gives following error:

No command 'convertf' found, did you mean:

Command 'convert' from package 'imagemagick' (main) Command 'convert' from package 'graphicsmagick-imagemagick-compat' (universe) convertf: command not found

I hope that someone can help me.

Thanks

1
Entering edit mode
4.7 years ago

No need to use EIGENSTRAT! PLINK v1.9 has implemented PCA: https://www.cog-genomics.org/plink2/strat

From the website on how to use --pca

By default, --pca extracts the top 20 principal components of the variance-standardized relationship matrix; you can change the number by passing a numeric parameter. Eigenvectors are written to plink.eigenvec, and top eigenvalues are written to plink.eigenval. The 'header' modifier adds a header line to the .eigenvec file(s), and the 'tabs' modifier makes the .eigenvec file(s) tab- instead of space-delimited. You can request variant weights with the 'var-wts' modifier, and dump the matrix by using --pca in combination with --make-rel/--make-grm-gz/--make-grm-bin.

0
Entering edit mode

But they are the same algorithm? Did they just copypasted the EIGENSTRAT algorithm in PLINK?

1
Entering edit mode

You were only two clicks away :) - If you would follow the link that I provided you would find:

This is a simple port of GCTA's --pca flag, which generates the same files from a previously computed relationship matrix.

Then if you go to the link of GCTA software:

Input the GRM and output the first n (n = 20, by default) eigenvalues (saved as *.eigenval, plain text file) and eigenvectors (saved as *.eigenvec, plain text file), which are equivalent to those calculated by the program EIGENSTRAT.

So, yes. It is the same as in EIGENSTRAT

0
Entering edit mode

Actually, I have to use EIGENSTRAT to learn. It is my homework :) If you know how can I do in eigenstrat, please let me know.

By the way, I tried to do pca in plink. I got .eigenvec and .eigenval files. I want to draw a plot. In .eigenvec file, I have family ID column, sample individual ID column and 20 PC columns,respectively. Should I use first two PC columns? How can I draw a plot? I tried to draw in R using first two PCs and got a plot but I need to colour up it and give population names. How can I figure it out?

0
Entering edit mode

The EIGENSTRAT manual is very detailed. Try to ask more specific questions or say what you have tried and where you're struggling. If you don't understand what you want to plot I would suggest to learn why to use PCA in genetic analysis in the first place. Try googling it and read relevant papers/reviews. Your last question is a different topic and is how to plot it using R. Read an R manual and check the plot() function and check all the parameters that you can use. For colors it is: col.

0
Entering edit mode

Does the plot() function have an argument for adding the sample labels? How can I plot the labels?? Thanks!