Dear Biostars Community,
I can't find a way to read my annotated VCF file with VEP from command line, with R.
I tried with vcfR and ensemblVEP, but I can't find the right way to deal with the VEP.vcf file.
Thank you in advance,
This is not pure R solution but it may be ok. Assuming the VEP annotation is the INFO tag CSQ and it uses | as separator (the default):
bcftools query -f '%CHROM|%POS|%ID|%REF|%ALT|%QUAL|%FILTER|%INFO/CSQ\n' vep.vcf > vep.txt
Now vep.txt is a | separated table with columns CHROM, POS, etc. The meaning of the VEP columns is in the VCF header you should be able to extract it with e.g.
bcftools view -h vep.vcf | grep '##INFO=<ID=CSQ,'
To read it in R you can use read.table('vep.txt', sep= '|', ...)
read.table('vep.txt', sep= '|', ...)
Thank you dariober, however this didn't function. I can't understand why I found an higher number of "|" in my file than expected by "INFO-ID=CSQ"... Who is this possible?? Make me really confused
In addition to the above answer, I would like to add that bcftools has a VEP-specific plugin which can also be used. More information can be found in: https://samtools.github.io/bcftools/howtos/plugin.split-vep.html
And of course, the resulted output file can be then easily imported in R.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy