Question: Annovar result explanation
gravatar for anu014
3.0 years ago by
anu014180 wrote:

Hello Biostars,

I was trying to annotate the VCF using ANNOVAR. I used "" script to get these 2 files : myanno.hg19_multianno.txt & myanno.hg19_multianno.vcf. Firstly, the number of rows are differing in both (after removing headers from both file). What is the reason behind this? Secondly, ANNOVAR VCF file Looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    779788  .       C       A       21.77   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;QD=10.88;SOR=0.693;VQSLOD=11.18;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=ncRNA_intronic;Gene.refGene=LINC01128;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END    GT:AD:GQ:PL     1/1:0,2:6:49,6,0
chr1    834928  rs4422949       A       G       70.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=23.43;SOR=2.833;VQSLOD=6.71;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22746\x3bdist\x3d17270;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:98,9,0
chr1    834999  rs28570054      G       A       82.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=27.43;SOR=2.833;VQSLOD=6.75;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22817\x3bdist\x3d17199;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:110,9,0
chr1    835499  rs4422948       A       G       105.03  PASS    AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=26.26;SOR=0.693;VQSLOD=7.40;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d23317\x3bdist\x3d16699;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,4:12:133,12,0

What is '\x3b' in Gene.refGene column?

Please help me out. Thank you!

annovar snp next-gen annotation • 2.5k views
ADD COMMENTlink modified 23 months ago by tannerkoomar40 • written 3.0 years ago by anu014180
gravatar for cpad0112
3.0 years ago by
cpad011213k wrote:

x3b is semicolon, File is not formatted well. Hence x3b is not parsed. Had it been parsed well, you would have ; instead of \x3b. x3d in your output is "=". Refer to UTF reference table here

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by cpad011213k

Thank you for the reply. What does 'distx3d' means then ?

ADD REPLYlink written 3.0 years ago by anu014180

Refer to above reply

ADD REPLYlink written 3.0 years ago by cpad011213k

Thank you so much. It saved my time!

ADD REPLYlink written 3.0 years ago by anu014180
gravatar for tannerkoomar
23 months ago by
tannerkoomar40 wrote:

This is the intended behavior for ANNOVAR

See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g'  myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz
ADD COMMENTlink written 23 months ago by tannerkoomar40

This is correct answer. upvoting...

ADD REPLYlink written 23 months ago by cpad011213k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1451 users visited in the last hour