I am trying to obtain a data frame from a vcf read with VariantAnnotation package. This vcf is the output of VEP (variant effect predictor), so, the columns corresponding to its annotations are not properly separated and I cannot parse them to different columns of a dataframe:
This is the line corresponding to VEP annotation info header:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_typeFeature|BIOTYPE|EXON|INTRON">
While the header for the rest of INFO fields follow this pattern:
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> ##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
as you can see, CSQ is the corresponding field to VEP annotation, which includes a lot of different parameters separated by "|" (Allele|Consequence|IMPACT,etc). This means that every field of data given by VEP is writen in the same field as a string with '|' separating them, instead of being writen as different INFO fields.
Is there any way to transform these field names as column names, separating each field inside CSQ by |; while maintaining the parsing of the rest of the INFO fields (CONTQ, DP, ECNT as colums with their corresponding values).
Shameless Self Promotion I recently wrote an extension for R parsing VEP. https://github.com/lindenb/rbcf
For which R version is 'rbcf' available? I cannot install it in R 3.6.1. I have been reading the git link, but I cannot install it as it is indicated there because I am working from my local pc (which is windows), is it possible to install it directly with R?
Hey, nice tool, this is exactly what I am looking for :)
Testing went fine, except I only always get one single variant in the "predictions" - any idea why ? And could you also add how to best combine this table with the genotypes ?
Yes I am also getting the same issue, the tool only seems to pull the first variant, how do you get it to work for all variantrs?
i don't see it. I see the CSQ tag which is a STRING. The 3 others TAG define a number.
Okay I've edited the question explaining it better