Question: Annovar result explanation
0
gravatar for anu014
16 months ago by
anu014160
India
anu014160 wrote:

Hello Biostars,

I was trying to annotate the VCF using ANNOVAR. I used "table_annovar.pl" script to get these 2 files : myanno.hg19_multianno.txt & myanno.hg19_multianno.vcf. Firstly, the number of rows are differing in both (after removing headers from both file). What is the reason behind this? Secondly, ANNOVAR VCF file Looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    779788  .       C       A       21.77   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;QD=10.88;SOR=0.693;VQSLOD=11.18;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=ncRNA_intronic;Gene.refGene=LINC01128;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END    GT:AD:GQ:PL     1/1:0,2:6:49,6,0
chr1    834928  rs4422949       A       G       70.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=23.43;SOR=2.833;VQSLOD=6.71;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22746\x3bdist\x3d17270;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:98,9,0
chr1    834999  rs28570054      G       A       82.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=27.43;SOR=2.833;VQSLOD=6.75;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22817\x3bdist\x3d17199;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:110,9,0
chr1    835499  rs4422948       A       G       105.03  PASS    AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=26.26;SOR=0.693;VQSLOD=7.40;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d23317\x3bdist\x3d16699;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,4:12:133,12,0

What is '\x3b' in Gene.refGene column?

Please help me out. Thank you!

annovar snp next-gen annotation • 1.1k views
ADD COMMENTlink modified 3 months ago by tannerkoomar30 • written 16 months ago by anu014160
2
gravatar for cpad0112
16 months ago by
cpad011210k
India
cpad011210k wrote:

x3b is semicolon, File is not formatted well. Hence x3b is not parsed. Had it been parsed well, you would have ; instead of \x3b. x3d in your output is "=". Refer to UTF reference table here

ADD COMMENTlink modified 16 months ago • written 16 months ago by cpad011210k

Thank you for the reply. What does 'distx3d' means then ?

ADD REPLYlink written 16 months ago by anu014160
1

Refer to above reply

ADD REPLYlink written 16 months ago by cpad011210k

Thank you so much. It saved my time!

ADD REPLYlink written 16 months ago by anu014160
3
gravatar for tannerkoomar
3 months ago by
tannerkoomar30
tannerkoomar30 wrote:

This is the intended behavior for ANNOVAR

See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g'  myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz
ADD COMMENTlink written 3 months ago by tannerkoomar30

This is correct answer. upvoting...

ADD REPLYlink written 3 months ago by cpad011210k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1420 users visited in the last hour