How do you input a vcf snp file in R directly to run a PCA for the individuals?
0
0
Entering edit mode
6.0 years ago

Hi, I have a raw vcf file composed of around 40,000 SNPs. I want to input it in R Studio and divide it into two files based on the SNP positions. I have done this and these two files currently exist as two independent unsaved data frames. After this I want to run a PCA and Fst analysis on these two SNP files in R itself. How do I go about it- what packages do I use and more importantly what format do the files need to be converted to? Or can it be performed on this kind of dataframe directly? I have SNP positions in the second column and SNP data (GT/PL/GQ) for each individual in the following columns. This is what the file looks like (1st row given with header)

CHROM POS REF ALT QUAL FORMAT Indiv 1 Indiv 2 Indiv 3

EU153401.1 209 A G 999 GT:PL:GQ 1/1:70,12,0:31 1/1:132,15,0:34 1/1:72,12,0:31

I also have a heterozygosity matrix for these two files (data in terms of 0,1,2). Can this be used for pca or fst?

I am new to R. Need some help urgently. Thanks a lot.

R SNP • 3.1k views
ADD COMMENT
1
Entering edit mode

This previous answer will probably help you: A: Pca From Vcf Files

ADD REPLY

Login before adding your answer.

Traffic: 2387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6