Question: Adding consequences field (INFO) when using --allow_non_variant in VEP
0
gravatar for magnolia
5 weeks ago by
magnolia0
magnolia0 wrote:

Hi,

I'm annotating my VCF using VEP. By default, VEP generates an output with variants only. So positions with no variation are not reported. I can add non-variant positions by adding --allow_non_variant tag but I cannot get INFO section for these positions. I want to have at least the gene HGNC name/symbol for those positions. Is it possible?

Thank you for your answers!

vep annotation vcf • 138 views
ADD COMMENTlink modified 5 weeks ago by Emily_Ensembl20k • written 5 weeks ago by magnolia0

Do you mean you already have HGNC IDs in the INFO field that are being omitted by VEP or that you'd like VEP to add HGNC symbols to all positions including non-variant positions?

ADD REPLYlink written 5 weeks ago by RamRS25k

I would like to add HGNC symbols to all positions including non-variant positions. This is the most crucial one but if possible, whatever custom database I give to VEP, I want to see the matched INFO in every position.

ADD REPLYlink written 5 weeks ago by magnolia0
0
gravatar for Emily_Ensembl
5 weeks ago by
Emily_Ensembl20k
EMBL-EBI
Emily_Ensembl20k wrote:

The INFO column fills in consequences of the variant on known gene. If there is no variant, there is no consequence so nothing to fill in. All --allow_no_variant does is keep it in the VCF output.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Emily_Ensembl20k

Thank you for the explanation. I know that the whole point of CSQ section is consequences of the variant. I just need to add information to non-variant positions as well. So I'm guessing it's impossible with VEP?

ADD REPLYlink written 5 weeks ago by magnolia0

Why do you need HGNC information at reference locations? The ideal workflow is to start by looking at just non-reference loci, so annotating reference loci will just bloat your VCF.

ADD REPLYlink written 5 weeks ago by RamRS25k

Because I also want to see what positions are covered in the VCF and which genes are in those positions.

For example, let's say I have a VCF file that contains 10 positions for BRCA1 but only one of them is a variant. Since I cannot keep every single gene's location in the genome in my mind, when I filter for BRCA1, I will get that 1 variant but I also want to see which other positions are covered for BRCA1 in the VCF.

ADD REPLYlink written 5 weeks ago by magnolia0

Your VCF should not contain all-ref positions. Ideally, VCFs only contain positions that are altered in at least one sample in the VCF. How do you even have all-ref positions? Are you using a gVCF?

ADD REPLYlink written 5 weeks ago by RamRS25k

I can just generate synthetic data in, lets say, tsv/txt and then convert it to VCF and then load into VEP. This part, in my opinion, doesn't matter. As VEP being variant effect predictor, I guess there is no flexibility for that. I cannot get any information for a position that is not a variant.

But since you mentioned, gVCF, what if I'm using that then? Is there a workaround?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by magnolia0

Not really, a gVCF has variant loci and non-variant "blocks". Your best bet is to use bcftools and a custom BED file with gene coordinates (or a regular GTF file) to do your own annotation.

ADD REPLYlink written 4 weeks ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour