Question: Principle component analysis using VCF file as input.
0
gravatar for sayyarsehrish
6 weeks ago by
sayyarsehrish0 wrote:

I have virus sequences from different geography for that I have perform Population structure analysis using STRUCTURE software. It give me Kopt at K=3. Now I want to perform PCA for these using Eigensoft but I have only vcf files. i have no case control data. how should i use VCF as input data.

pca vcf • 183 views
ADD COMMENTlink modified 6 weeks ago by devarora320 • written 6 weeks ago by sayyarsehrish0
1
gravatar for 4galaxy77
6 weeks ago by
4galaxy77320
United Kingdom
4galaxy77320 wrote:

I would reccomend first converting to plink format (I have found a couple of odd things happening when you use a vcf directly).

plink2 --vcf data.vcf --make-bed --out data

If you haven't already, it's a good thing to LD prune and remove rare variants

plink2 --bfile data --maf 0.01 --indep-pairwise 50 5 0.2 --out data_clean
plink2 --bfile data --extract data_clean.in --make-bed --out data_clean_prune

then do the PCA.

plink2 --bfile data_clean_prune --pca --out data_clean_prune
ADD COMMENTlink written 6 weeks ago by 4galaxy77320
1
gravatar for devarora
6 weeks ago by
devarora320
SouthKorea
devarora320 wrote:

You can make PCA plot from VCF file using SNPRelate R package. There is already a relevant post VCF to PCA you can check it.

ADD COMMENTlink written 6 weeks ago by devarora320
0
gravatar for sayyarsehrish
6 weeks ago by
sayyarsehrish0 wrote:

I run this command but it make only fim

Start time: Thu Jan 21 02:35:20 2021
3877 MiB RAM detected; reserving 1938 MiB for main workspace.
Using up to 4 compute threads.
--vcf: 4690 variants scanned.
--vcf: data-temporary.pgen + data-temporary.pvar + data-temporary.psam written.
822 samples (0 females, 0 males, 822 ambiguous; 822 founders) loaded from data-temporary.psam. 4690 variants loaded from data-temporary.pvar. Note: No phenotype data present. Writing data.fam ... done. Writing data.bim ... Error: data.bim cannot contain multiallelic variants. End time: Thu Jan 21 02:35:20 2021

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by sayyarsehrish0

The important error here is "Error: data.bim cannot contain multiallelic variants".

ADD REPLYlink written 6 weeks ago by 4galaxy77320

Use --make-pgen/--pfile instead of --make-bed/--bfile when working with multiallelic variants.

ADD REPLYlink written 6 weeks ago by chrchang5237.7k

I use pgen command it give me three files i.e. pgen, pvar, and psam. how can i use these file for plotting PCA. plz guide me further.

--vcf: 4690 variants scanned. --vcf: NV-temporary.pgen + NV-temporary.pvar.zst + NV-temporary.psam written. 822 samples (0 females, 0 males, 822 ambiguous; 822 founders) loaded from NV-temporary.psam. 4690 variants loaded from NV-temporary.pvar.zst. Note: No phenotype data present. Writing NV.psam ... done. Writing NV.pvar ... done. Writing NV.pgen ... done. End time: Mon Jan 25 10:24:45 2021

ADD REPLYlink written 6 weeks ago by sayyarsehrish0

i use this command for PCA (plink2 --pfile file --PCA) it give me error failed to open .psam file.

ADD REPLYlink written 6 weeks ago by sayyarsehrish0

Did you try googling the error message?

ADD REPLYlink written 6 weeks ago by 4galaxy77320

thanks it solved i have got two files eigenvec and eigen value. i am confused that my psm file have no sex(male female) information, will this create any bias in result?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by sayyarsehrish0

also guide me how can now use egeinvec and eigen value in R for plotting pca

ADD REPLYlink written 5 weeks ago by sayyarsehrish0

please guide me. I have low knowledge about plink and PCA.

ADD REPLYlink written 6 weeks ago by sayyarsehrish0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1656 users visited in the last hour
_