Hello, I am trying to create the proper format of genotype data for use in the R package con ancestry to determine a relatedness matrix. I cannot get the format correct. I have a vcf file and have extracted genotypes and tried to reformat into the programs requirements. Any help would be greatly appreciated!
Thanks in advance. Karen
Here is my code thus far:
captiveRSvcf_file <- "vcf.vcf"
vcf_data <- read.vcfR(captiveRSvcf_file)
genotype_matrix <- extract.gt(vcf_data)
genotype_matrix.t <- t(genotype_matrix)
I am not sure how to separate the genotypes in each column for each locus for the appropriate format. Her are the requirements for Coancestry...
The file containing the genotype data to be analyzed. The file will need to be in R's working directory, and have the following characteristics: (1) It should be a text file (not and Excel file); (2) It should be space- or tab-delimited; (3) Missing data must be represented as zeros (0); and (4) There should not be a header row containing column names. Column 1 should contain individual identifiers, columns 2 and 3 should contain alleles 1 and 2 for locus 1, columns 4 & 5 should contain alleles 1 and 2 for locus 2, and so on. Thus, the total number of columns should be 2 x the number of loci + 1.