I am trying to obtain a data frame from a vcf read with VariantAnnotation package. This vcf is the output of VEP (variant effect predictor), so, the columns corresponding to its annotations are not properly separated and I cannot parse them to different columns of a dataframe:
This is the line corresponding to VEP annotation info header:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_typeFeature|BIOTYPE|EXON|INTRON">
While the header for the rest of INFO fields follow this pattern:
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> ##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
as you can see, CSQ is the corresponding field to VEP annotation, which includes a lot of different parameters separated by "|" (Allele|Consequence|IMPACT,etc). This means that every field of data given by VEP is writen in the same field as a string with '|' separating them, instead of being writen as different INFO fields.
Is there any way to transform these field names as column names, separating each field inside CSQ by |; while maintaining the parsing of the rest of the INFO fields (CONTQ, DP, ECNT as colums with their corresponding values).