Question: How do you input a vcf snp file in R directly to run a PCA for the individuals?
0
gravatar for aslanforever94
3 months ago by
aslanforever940 wrote:

Hi, I have a raw vcf file composed of around 40,000 SNPs. I want to input it in R Studio and divide it into two files based on the SNP positions. I have done this and these two files currently exist as two independent unsaved data frames. After this I want to run a PCA and Fst analysis on these two SNP files in R itself. How do I go about it- what packages do I use and more importantly what format do the files need to be converted to? Or can it be performed on this kind of dataframe directly? I have SNP positions in the second column and SNP data (GT/PL/GQ) for each individual in the following columns. This is what the file looks like (1st row given with header)

CHROM POS REF ALT QUAL FORMAT Indiv 1 Indiv 2 Indiv 3

EU153401.1 209 A G 999 GT:PL:GQ 1/1:70,12,0:31 1/1:132,15,0:34 1/1:72,12,0:31

I also have a heterozygosity matrix for these two files (data in terms of 0,1,2). Can this be used for pca or fst?

I am new to R. Need some help urgently. Thanks a lot.

snp R • 192 views
ADD COMMENTlink written 3 months ago by aslanforever940
1

This previous answer will probably help you: A: Pca From Vcf Files

ADD REPLYlink written 3 months ago by Philipp Bayer5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 719 users visited in the last hour