Question: Repeated information on specific parameters of the INFO field - VCF files - EBivariation vcf-validator
0
gravatar for daianagan
10 days ago by
daianagan0
daianagan0 wrote:

Hello everyone! I am new to manipulating VCF files, and they recommended me the EBIvariation/vcf-validator to check that the file is correctly put. I got from my variant calling (I don't do it, it's the output of the service we pay for) a vcf file which has many repeated info in the INFO field of it, for example:

AA=p.K2811fs46,p.K2811fs46; CDS=c.8426delA,c.8426delA; CNT=1,1

Apparently, having "p.K2811fs*46" twice is not valid, so I should keep only one.

I cannot yet find any tool that does this (not sure if there even is one at all), but any help is very welcome!!!

next-gen vcf-validator vcf • 137 views
ADD COMMENTlink modified 8 days ago by finswimmer2.1k • written 10 days ago by daianagan0

Hello daianagan,

could you please post the complete header from the vcf file and the first 5-10 variants.

fin swimmer

ADD REPLYlink written 10 days ago by finswimmer2.1k

Sorry, I didn't realize I was answering as a new comment

ADD REPLYlink written 10 days ago by daianagan0

Hello Fin! Thanks for your reply, here is what you've asked for. I've attached it, since the format when copying here was a mess.

ADD REPLYlink written 10 days ago by daianagan0

Hello daianagan,

in your example vcf I could not find any repeated information. Do I overlook something? If there are no repeated information for every entry please add some examples which have.

fin swimmer

ADD REPLYlink written 9 days ago by finswimmer2.1k

So sorry about that. It's updated now, the last one has, among others, the AA info duplicated. Thank you!

ADD REPLYlink written 9 days ago by daianagan0

This is VEP annotated vcf and this example vcf doesn't have OP entries.

ADD REPLYlink modified 8 days ago • written 8 days ago by cpad01125.3k

Hi cpad! Thank you for your reply. If it is not too much to ask, can you briefly explain to me what a VEP annotated vcf mean? Why would this bring any trouble? Also, what do OP entries are? Thank you!!!

ADD REPLYlink written 6 days ago by daianagan0

Original VCF was functionally annotated with VEP as the tags in the OP (original post) are inline with VEP output. Duplicate entries you have posted at the start are not present in VCF file you have shared. Apparently, that duplicate p syntax might be due to multiple transcripts being affected by that variation. One needs to be careful before annotating the output.

ADD REPLYlink written 6 days ago by cpad01125.3k
4
gravatar for finswimmer
8 days ago by
finswimmer2.1k
Germany
finswimmer2.1k wrote:

Hello daianagan,

the problem with your vcf is not just that there are duplicate values for some INFO field, but in the header there is also defined that these fields only hold 1 entry.

##INFO=<ID=CDS,Number=1,Type=String,Description="CDS annotation">
##INFO=<ID=AA,Number=1,Type=String,Description="Peptide annotation">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">
##INFO=<ID=CNT,Number=1,Type=Integer,Description="How many samples have this mutation">
##INFO=<ID=STRAND,Number=1,Type=String,Description="Gene strand">

The "Number" defines how many values are allowed. For more information see the manual.

In your example there are not just duplicates. Look at this:

GENE=ATM_ENST00000278616,ATM

This entry has two different values, but only one is allowed.

Here's a little python script which iterates over all records on your vcf and truncate all INFO fields to the number given in the header.

Save the code as fixDuplicates.py and run it like this: $ python fixDuplicates.py prueba.vcf > prueba_corrected.vcf

The script makes use of pysam. You have to install this package first.

fin swimmer

ADD COMMENTlink modified 8 days ago • written 8 days ago by finswimmer2.1k

Thank you very much Fin!!!

ADD REPLYlink written 6 days ago by daianagan0

Hello daianagan,

fine if I could help. Please upvote post that you find helpful or mark answers as accepted so everyone can easily see that this solve your problem.

enter image description here

fin swimmer

ADD REPLYlink written 6 days ago by finswimmer2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 733 users visited in the last hour