Retrieving allele-specific information for a variant using VEP annotation
1
0
Entering edit mode
11 months ago
bt_cepo ▴ 40

Hi all!

I annotated a VCF file using VEP and noticed that it reports several variant IDs to each input variant. For example, this is an excerpt of one of the variant lines (I removed the annotation info that is not relevant to the question):

12  25398284    .   C   A   .   .   rs121913529&COSV55497369&COSV55497419&COSV55497479

As you can see, for this variant three diferent COSMIC ids are reported although only one of them (COSV55497419) corresponds to the alternate allele that is found (C>A). The rest of ids refer to other alternate alleles that can also be found at that position.

After reading VEP's documentation I know this is the expected behavior, but I am kind of confused about the following lines. I am not really sure I understand what it is refering to as "variants with unknown alleles":

For some data sources (COSMIC, HGMD), Ensembl is not licensed to redistribute allele-specific data, so VEP will report the existence of co-located variants with unknown alleles without carrying out allele matching. To disable this behaviour and exclude these variants, use the --exclude_null_alleles flag.

Just in case, I repeated the annotation using the --exclude_null_alleles flag but the output for the ids is now blank for COSMIC, only the dbSNP code is reported.

So basically I would like to have only the specific COSMIC id for my variant. Does anyone know how can I perform the annotation with VEP so it only reports the specific COSMIC id of the alternate allele that is present in my VCF?

Thanks a lot for reading!!!

variant-calling VEP COSMIC • 965 views
ADD COMMENT
2
Entering edit mode
11 months ago

My interpretation of the documentation is that no, you can't have the specific COSMIC id, because of that licensing whatchamacallit.

ADD COMMENT
1
Entering edit mode

It is pretty trivial to do the COSMIC annotation without tools such as VEP - download their coding and non-coding VCFs, normalize (decompose/left-align) and merge them to create one annotation VCF. Use this with bcftools annotate to get the annotations you need.

ADD REPLY
0
Entering edit mode

Thank you both for your answers Joel Wallenius Ram !

For anyone facing the same problem, I followed Ram's suggestions and it worked beautifully. After merging the vcfs, annotate using:

bcftools annotate -a normalized.merged.cosmic.vcf.gz -c ID,INFO input.vcf.gz > annotated.vcf.gz

This command adds the single COSMIC id that correspond to your variant's specific allele to the ID field of the input, and also appends the INFO section for each variant.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6