Hey, here is how I have done it in the past:
library(data.table)
library(dplyr)
temp_chrom <- vcfR::read.vcfR(vcf_path)
temp_gt <- temp_chrom@gt[,] %>% as.data.table
temp_out <- data.table(CHR = temp_chrom@fix[,"CHROM"], POS = temp_chrom@fix[,"POS"] %>% as.integer(),
REF = temp_chrom@fix[,"REF"],
ALT = temp_chrom@fix[,"ALT"])
Simplified_VCF <- cbind(temp_out, temp_gt)
Here is a snippet of the output:
> Simplified_VCF
CHR POS REF ALT FORMAT Collarless_A_val Collarless_B_val Collarless_C_val
1: CM029350.1 100 T A GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:34,0:34:93:.:.:0,93,1395 0/0:40,0:40:99:.:.:0,114,1710 0/0:39,0:39:58:.:.:0,58,1368
2: CM029350.1 106 A G GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:34,0:34:90:.:.:0,90,1350 0/0:44,0:44:99:.:.:0,120,1800 0/0:43,0:43:48:.:.:0,48,1424
3: CM029350.1 242 A C GT:AD:DP:GQ:PL 0/0:37,0:37:99:0,102,1530 0/0:47,0:47:99:0,101,1709 0/0:49,0:49:99:0,112,1800
4: CM029350.1 1509 G A GT:AD:DP:GQ:PL 1/1:0,57:57:99:2109,170,0 1/1:0,83:83:99:3042,247,0 1/1:0,70:70:99:2452,209,0
5: CM029350.1 1763 G A GT:AD:DP:GQ:PL 0/0:43,0:43:99:0,106,1422 0/0:54,0:54:99:0,106,1516 0/0:62,0:62:99:0,118,1800
---
248278: CM029350.1 27080198 G C GT:AD:DP:GQ:PL 1/1:0,47:47:99:1768,141,0 1/1:0,66:66:99:2439,198,0 1/1:0,64:64:99:2307,191,0
248279: CM029350.1 27080215 A T GT:AD:DP:GQ:PL 1/1:0,49:49:99:1816,146,0 1/1:0,59:59:99:2152,176,0 1/1:0,66:66:99:2498,198,0
248280: CM029350.1 27080272 A G GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,44:44:99:1|1:27080272_A_G:1968,132,0:27080272 1|1:0,48:48:99:1|1:27080272_A_G:2141,144,0:27080272 1|1:0,40:40:99:1|1:27080272_A_G:1796,120,0:27080272
248281: CM029350.1 27080280 T C GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,42:42:99:1|1:27080272_A_G:1890,126,0:27080272 1|1:0,45:45:99:1|1:27080272_A_G:2025,135,0:27080272 1|1:0,40:40:99:1|1:27080272_A_G:1796,120,0:27080272
248282: CM029350.1 27080293 G A GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,34:34:99:1|1:27080272_A_G:1530,102,0:27080272 1|1:0,44:44:99:1|1:27080272_A_G:1980,132,0:27080272 1|1:0,36:36:99:1|1:27080272_A_G:1620,108,0:27080272
The order of rows accessed by @[fix|gt|ann]
are preserved, so you can use cbind
to bind dataframes. If you do not want to use data.table
then replace as.data.table
with as.data.frame
.
Thank you for the code and idea dthorbur! I think the code to generate temp_output in the last row is missing, which is probably your way of loading the annotations in R. Do you proceed with a chrom object or just with read.csv()?
Oops, I wrote the wrong variable name. I have fixed the line in the post.