Using vcfR to group variants per annotations
1
0
Entering edit mode
13 months ago

I am using the vcfR package for R in order to group the SNPs variants of a VCF file per annotations. With vcfR,I have built a chrom object with a VCF file, a FASTA reference file, and a GFF annotation file describing the exons/introns in my sequence.

I have been following the documentation related to the package presenting the chrom objects to understand how to manipulate them. https://knausb.github.io/vcfR_documentation/chromR_object.html

But the documentation just explain how to make a plot and explore the object, not how to group information.

I know that the following code can give genotypes:

extract.gt(chrom)

And that this one can access to the annotations:

chrom@ann

But I have not found a way to relate them and export the information in a dataframe. Has anyone worked on this? Is there a tutorial you would recommand for this kind of data manipulation?

vcf vcfR R • 932 views
ADD COMMENT
2
Entering edit mode
13 months ago
dthorbur ★ 1.9k

Hey, here is how I have done it in the past:

library(data.table) 
library(dplyr)

temp_chrom <- vcfR::read.vcfR(vcf_path)
temp_gt <- temp_chrom@gt[,] %>% as.data.table
temp_out <- data.table(CHR = temp_chrom@fix[,"CHROM"], POS = temp_chrom@fix[,"POS"] %>% as.integer(), 
        REF = temp_chrom@fix[,"REF"], 
        ALT = temp_chrom@fix[,"ALT"])

Simplified_VCF <- cbind(temp_out, temp_gt)

Here is a snippet of the output:

> Simplified_VCF
               CHR      POS REF ALT                    FORMAT                                    Collarless_A_val                                    Collarless_B_val                                    Collarless_C_val
     1: CM029350.1      100   T   A GT:AD:DP:GQ:PGT:PID:PL:PS                        0/0:34,0:34:93:.:.:0,93,1395                       0/0:40,0:40:99:.:.:0,114,1710                        0/0:39,0:39:58:.:.:0,58,1368
     2: CM029350.1      106   A   G GT:AD:DP:GQ:PGT:PID:PL:PS                        0/0:34,0:34:90:.:.:0,90,1350                       0/0:44,0:44:99:.:.:0,120,1800                        0/0:43,0:43:48:.:.:0,48,1424
     3: CM029350.1      242   A   C            GT:AD:DP:GQ:PL                           0/0:37,0:37:99:0,102,1530                           0/0:47,0:47:99:0,101,1709                           0/0:49,0:49:99:0,112,1800
     4: CM029350.1     1509   G   A            GT:AD:DP:GQ:PL                           1/1:0,57:57:99:2109,170,0                           1/1:0,83:83:99:3042,247,0                           1/1:0,70:70:99:2452,209,0
     5: CM029350.1     1763   G   A            GT:AD:DP:GQ:PL                           0/0:43,0:43:99:0,106,1422                           0/0:54,0:54:99:0,106,1516                           0/0:62,0:62:99:0,118,1800
    ---                                                                                                                                                                                                                  
248278: CM029350.1 27080198   G   C            GT:AD:DP:GQ:PL                           1/1:0,47:47:99:1768,141,0                           1/1:0,66:66:99:2439,198,0                           1/1:0,64:64:99:2307,191,0
248279: CM029350.1 27080215   A   T            GT:AD:DP:GQ:PL                           1/1:0,49:49:99:1816,146,0                           1/1:0,59:59:99:2152,176,0                           1/1:0,66:66:99:2498,198,0
248280: CM029350.1 27080272   A   G GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,44:44:99:1|1:27080272_A_G:1968,132,0:27080272 1|1:0,48:48:99:1|1:27080272_A_G:2141,144,0:27080272 1|1:0,40:40:99:1|1:27080272_A_G:1796,120,0:27080272
248281: CM029350.1 27080280   T   C GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,42:42:99:1|1:27080272_A_G:1890,126,0:27080272 1|1:0,45:45:99:1|1:27080272_A_G:2025,135,0:27080272 1|1:0,40:40:99:1|1:27080272_A_G:1796,120,0:27080272
248282: CM029350.1 27080293   G   A GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,34:34:99:1|1:27080272_A_G:1530,102,0:27080272 1|1:0,44:44:99:1|1:27080272_A_G:1980,132,0:27080272 1|1:0,36:36:99:1|1:27080272_A_G:1620,108,0:27080272

The order of rows accessed by @[fix|gt|ann] are preserved, so you can use cbind to bind dataframes. If you do not want to use data.table then replace as.data.table with as.data.frame.

ADD COMMENT
0
Entering edit mode

Thank you for the code and idea dthorbur! I think the code to generate temp_output in the last row is missing, which is probably your way of loading the annotations in R. Do you proceed with a chrom object or just with read.csv()?

ADD REPLY
1
Entering edit mode

Oops, I wrote the wrong variable name. I have fixed the line in the post.

ADD REPLY

Login before adding your answer.

Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6