Question: Is there a program that allows you to add FORMAT/genotype tags to a VCF (_not_ INFO tags)
0
gravatar for w.gus.dunn
3.2 years ago by
w.gus.dunn0
USA/Boston/Boston Children's Hospital
w.gus.dunn0 wrote:

OK, I have searched for this everywhere, and I just can't seem to even figure out if it is possible/meaningful to annotate (add tags and associated data) VCFs with external 'genotype' (##FORMAT=<ID=VALUE,Number=VALUE,Type=VALUE,Description="VALUE">) fields.

I know that bcftools annotate and other tools can add INFO tags and can EXCLUDE FORMAT tags. However the information that I need to add does not make sense except when associated with a specific sample, while INFO tags apply to the variant without regard to the sample in which it occurs.

For example,I am comparing multiple family-triplets (mother/father/affected-child). I would like to add a tag in the FORMAT field that represents which triplet the individual belongs to. In addition, I would like to add information to each sample that indicates which 'mode of inheritance' the SNP appears to follow in each triplet.

This is information that is inherently tied to the sample and therefore ill-suited to the INFO-type tag; however, I can not for the life of me find a tool that even mentions this. Am I missing some super-obvious reason that people don't ever need/want to be able to annotate VCFs in this fashion? Or is my google-fu simply too weak?

For your reference I will share what I have attempted using bcftools annotate (all zipping and indexing of related files has been ommited here for brevity):

annots.tab.gz

CHROM   POS AGE_MO  BAM_OK  FAM_ID  MOI
1   12921499    30  0   youdontknowme   CmpHet
1   12921600    30  0   youdontknowme   CmpHet
1   12939476    30  0   youdontknowme   CmpHet
1   12939562    30  0   youdontknowme   CmpHet
1   12939747    30  0   youdontknowme   CmpHet
1   12942047    30  1   youdontknowme   CmpHet
1   12942138    30  1   youdontknowme   CmpHet
1   12942179    30  1   youdontknowme   CmpHet
...

annots.hdr

##FORMAT=<ID=AGE_MO,Number=1,Type=Float,Description="Age of associated proband in months.">
##FORMAT=<ID=FAM_ID,Number=1,Type=String,Description="Identification of family to which the individual belongs.">
##FORMAT=<ID=BAM_OK,Number=0,Type=Flag,Description="Manual inspection of the BAM file corroborates the MOI.">
##FORMAT=<ID=MOI,Number=1,Type=String,Description="Mode of Inheritance: HZR=recessive, DeNovo=de novo, XL=X-linked, CmpHet=Compound Het">

bcftools command

bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,POS,AGE_MO,BAM_OK,FAM_ID,MOI data.vcf.bgz -Ou -o annotated.data.bcf

Resulting error

The tag "AGE_MO" is not defined in annots.tab.gz
next-gen software vcf • 1.3k views
ADD COMMENTlink modified 2.9 years ago by d-cameron2.1k • written 3.2 years ago by w.gus.dunn0
0
gravatar for d-cameron
2.9 years ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

The current version of bcftools annotate appears to have this functionality. According to the documentation, it can annotate FORMAT (ie per sample/genotype) fields but you need to specify FORMAT/TAG for the tag instead of just TAG (which it interprets as INFO/TAG).

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by d-cameron2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 684 users visited in the last hour