Question: Annovar result explanation
gravatar for anu014
22 months ago by
anu014160 wrote:

Hello Biostars,

I was trying to annotate the VCF using ANNOVAR. I used "" script to get these 2 files : myanno.hg19_multianno.txt & myanno.hg19_multianno.vcf. Firstly, the number of rows are differing in both (after removing headers from both file). What is the reason behind this? Secondly, ANNOVAR VCF file Looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    779788  .       C       A       21.77   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;QD=10.88;SOR=0.693;VQSLOD=11.18;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=ncRNA_intronic;Gene.refGene=LINC01128;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END    GT:AD:GQ:PL     1/1:0,2:6:49,6,0
chr1    834928  rs4422949       A       G       70.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=23.43;SOR=2.833;VQSLOD=6.71;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22746\x3bdist\x3d17270;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:98,9,0
chr1    834999  rs28570054      G       A       82.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=27.43;SOR=2.833;VQSLOD=6.75;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22817\x3bdist\x3d17199;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:110,9,0
chr1    835499  rs4422948       A       G       105.03  PASS    AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=26.26;SOR=0.693;VQSLOD=7.40;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d23317\x3bdist\x3d16699;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,4:12:133,12,0

What is '\x3b' in Gene.refGene column?

Please help me out. Thank you!

annovar snp next-gen annotation • 1.6k views
ADD COMMENTlink modified 9 months ago by tannerkoomar40 • written 22 months ago by anu014160
gravatar for cpad0112
22 months ago by
cpad011211k wrote:

x3b is semicolon, File is not formatted well. Hence x3b is not parsed. Had it been parsed well, you would have ; instead of \x3b. x3d in your output is "=". Refer to UTF reference table here

ADD COMMENTlink modified 22 months ago • written 22 months ago by cpad011211k

Thank you for the reply. What does 'distx3d' means then ?

ADD REPLYlink written 22 months ago by anu014160

Refer to above reply

ADD REPLYlink written 22 months ago by cpad011211k

Thank you so much. It saved my time!

ADD REPLYlink written 22 months ago by anu014160
gravatar for tannerkoomar
9 months ago by
tannerkoomar40 wrote:

This is the intended behavior for ANNOVAR

See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g'  myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz
ADD COMMENTlink written 9 months ago by tannerkoomar40

This is correct answer. upvoting...

ADD REPLYlink written 9 months ago by cpad011211k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour