Question: Repeated information on specific parameters of the INFO field - VCF files - EBivariation vcf-validator
0
gravatar for daianagan
5 months ago by
daianagan10
daianagan10 wrote:

Hello everyone! I am new to manipulating VCF files, and they recommended me the EBIvariation/vcf-validator to check that the file is correctly put. I got from my variant calling (I don't do it, it's the output of the service we pay for) a vcf file which has many repeated info in the INFO field of it, for example:

AA=p.K2811fs46,p.K2811fs46; CDS=c.8426delA,c.8426delA; CNT=1,1

Apparently, having "p.K2811fs*46" twice is not valid, so I should keep only one.

I cannot yet find any tool that does this (not sure if there even is one at all), but any help is very welcome!!!

next-gen vcf-validator vcf • 309 views
ADD COMMENTlink modified 5 months ago by finswimmer6.2k • written 5 months ago by daianagan10

Hello daianagan,

could you please post the complete header from the vcf file and the first 5-10 variants.

fin swimmer

ADD REPLYlink written 5 months ago by finswimmer6.2k

Sorry, I didn't realize I was answering as a new comment

ADD REPLYlink written 5 months ago by daianagan10

Hello Fin! Thanks for your reply, here is what you've asked for. I've attached it, since the format when copying here was a mess.

ADD REPLYlink written 5 months ago by daianagan10

Hello daianagan,

in your example vcf I could not find any repeated information. Do I overlook something? If there are no repeated information for every entry please add some examples which have.

fin swimmer

ADD REPLYlink written 5 months ago by finswimmer6.2k

So sorry about that. It's updated now, the last one has, among others, the AA info duplicated. Thank you!

ADD REPLYlink written 5 months ago by daianagan10

This is VEP annotated vcf and this example vcf doesn't have OP entries.

ADD REPLYlink modified 5 months ago • written 5 months ago by cpad01129.3k

Hi cpad! Thank you for your reply. If it is not too much to ask, can you briefly explain to me what a VEP annotated vcf mean? Why would this bring any trouble? Also, what do OP entries are? Thank you!!!

ADD REPLYlink written 4 months ago by daianagan10

Original VCF was functionally annotated with VEP as the tags in the OP (original post) are inline with VEP output. Duplicate entries you have posted at the start are not present in VCF file you have shared. Apparently, that duplicate p syntax might be due to multiple transcripts being affected by that variation. One needs to be careful before annotating the output.

ADD REPLYlink written 4 months ago by cpad01129.3k
4
gravatar for finswimmer
5 months ago by
finswimmer6.2k
Germany
finswimmer6.2k wrote:

Hello daianagan,

the problem with your vcf is not just that there are duplicate values for some INFO field, but in the header there is also defined that these fields only hold 1 entry.

##INFO=<ID=CDS,Number=1,Type=String,Description="CDS annotation">
##INFO=<ID=AA,Number=1,Type=String,Description="Peptide annotation">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">
##INFO=<ID=CNT,Number=1,Type=Integer,Description="How many samples have this mutation">
##INFO=<ID=STRAND,Number=1,Type=String,Description="Gene strand">

The "Number" defines how many values are allowed. For more information see the manual.

In your example there are not just duplicates. Look at this:

GENE=ATM_ENST00000278616,ATM

This entry has two different values, but only one is allowed.

Here's a little python script which iterates over all records on your vcf and truncate all INFO fields to the number given in the header.

Save the code as fixDuplicates.py and run it like this: $ python fixDuplicates.py prueba.vcf > prueba_corrected.vcf

The script makes use of pysam. You have to install this package first.

fin swimmer

ADD COMMENTlink modified 5 months ago • written 5 months ago by finswimmer6.2k

Thank you very much Fin!!!

ADD REPLYlink written 4 months ago by daianagan10

Hello daianagan,

fine if I could help. Please upvote post that you find helpful or mark answers as accepted so everyone can easily see that this solve your problem.

enter image description here

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1835 users visited in the last hour