I have virus sequences from different geography for that I have perform Population structure analysis using STRUCTURE software. It give me Kopt at K=3. Now I want to perform PCA for these using Eigensoft but I have only vcf files. i have no case control data. how should i use VCF as input data.
I would reccomend first converting to plink format (I have found a couple of odd things happening when you use a vcf directly).
plink2 --vcf data.vcf --make-bed --out data
If you haven't already, it's a good thing to LD prune and remove rare variants
plink2 --bfile data --maf 0.01 --indep-pairwise 50 5 0.2 --out data_clean
plink2 --bfile data --extract data_clean.in --make-bed --out data_clean_prune
then do the PCA.
plink2 --bfile data_clean_prune --pca --out data_clean_prune
You can make PCA plot from VCF file using SNPRelate R package. There is already a relevant post VCF to PCA you can check it.
I run this command but it make only fim
Start time: Thu Jan 21 02:35:20 2021
3877 MiB RAM detected; reserving 1938 MiB for main workspace.
Using up to 4 compute threads.
--vcf: 4690 variants scanned.
--vcf: data-temporary.pgen + data-temporary.pvar + data-temporary.psam written.
822 samples (0 females, 0 males, 822 ambiguous; 822 founders) loaded from
data-temporary.psam.
4690 variants loaded from data-temporary.pvar.
Note: No phenotype data present.
Writing data.fam ... done.
Writing data.bim ...
Error: data.bim cannot contain multiallelic variants.
End time: Thu Jan 21 02:35:20 2021
Use --make-pgen/--pfile instead of --make-bed/--bfile when working with multiallelic variants.
I use pgen command it give me three files i.e. pgen, pvar, and psam. how can i use these file for plotting PCA. plz guide me further.
--vcf: 4690 variants scanned. --vcf: NV-temporary.pgen + NV-temporary.pvar.zst + NV-temporary.psam written. 822 samples (0 females, 0 males, 822 ambiguous; 822 founders) loaded from NV-temporary.psam. 4690 variants loaded from NV-temporary.pvar.zst. Note: No phenotype data present. Writing NV.psam ... done. Writing NV.pvar ... done. Writing NV.pgen ... done. End time: Mon Jan 25 10:24:45 2021
i use this command for PCA (plink2 --pfile file --PCA) it give me error failed to open .psam file.