Question: Adding An Extra Column To A Vcf File.
gravatar for Pierre Lindenbaum
9.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

Hi all, I've just written a tool adding one or more extra column in a VCF file. The header now looks like this:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    MY_COL1    MY_COL2    FORMAT    NA00001    NA00002    NA00003

Is there something in the VCF spec saying that another column can't be added ? because when I used VCFTOOLS, it says:

vcftools --vcf file.vcf 
Scanning file.vcf ... 
Ninth Header entry should be FORMAT: MY_COL1
Currently scanning CHROM: 19
Currently scanning CHROM: 20
Currently scanning CHROM: X
sequencing next-gen format vcf • 4.6k views
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.5 years ago by Pierre Lindenbaum123k

I fixed this problem by creating a new file format :-)

ADD REPLYlink modified 13 months ago by RamRS24k • written 9.4 years ago by Pierre Lindenbaum123k
gravatar for Aaronquinlan
9.3 years ago by
United States
Aaronquinlan11k wrote:

BEDTools now supports VCF and you can tack on any number of columns you want. That said, if you are looking for specific functionality within VCFTools, then this isn't helpful at all.

ADD COMMENTlink written 9.3 years ago by Aaronquinlan11k
gravatar for Giovanni M Dall'Olio
9.5 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

Which kind of information do you want to add? I don't think that the VCF specifications allow to add new columns, but since this format is still in an early phase of development, you could contact the authors and propose them a new functionality.

However, VCF files should be used only to describe the SNPs and their genotypes, and any other kind of information should go somewhere else... for example, if you have statistics associated with a snp, you should consider a flat file or a database.

ADD COMMENTlink written 9.5 years ago by Giovanni M Dall'Olio26k
gravatar for lh3
8.8 years ago by
United States
lh331k wrote:

Instead of inserting new columns which will screw up most tools, you should add your custom information at the ANNO column. This is what that field is designed for. With perl, it is very easy to extract the key-value pair there, e.g.:

perl -ane 'print "MYKEY=$1\n" if $F[7]=~/MYKEY=([^;]+)/'

Furthermore, VCF is not only used for SNPs, but also for INDELs and SVs. To make this format, various people from several major sequencing centers have joined the discussion. In my opinion, it is quite stable now. Small details may be changed in future, but not the number of columns.

ADD COMMENTlink modified 13 months ago by RamRS24k • written 8.8 years ago by lh331k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2681 users visited in the last hour