Question: Annovar result explanation
0
gravatar for anu014
2.2 years ago by
anu014170
India
anu014170 wrote:

Hello Biostars,

I was trying to annotate the VCF using ANNOVAR. I used "table_annovar.pl" script to get these 2 files : myanno.hg19_multianno.txt & myanno.hg19_multianno.vcf. Firstly, the number of rows are differing in both (after removing headers from both file). What is the reason behind this? Secondly, ANNOVAR VCF file Looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    779788  .       C       A       21.77   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;QD=10.88;SOR=0.693;VQSLOD=11.18;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=ncRNA_intronic;Gene.refGene=LINC01128;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END    GT:AD:GQ:PL     1/1:0,2:6:49,6,0
chr1    834928  rs4422949       A       G       70.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=23.43;SOR=2.833;VQSLOD=6.71;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22746\x3bdist\x3d17270;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:98,9,0
chr1    834999  rs28570054      G       A       82.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=27.43;SOR=2.833;VQSLOD=6.75;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22817\x3bdist\x3d17199;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:110,9,0
chr1    835499  rs4422948       A       G       105.03  PASS    AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=26.26;SOR=0.693;VQSLOD=7.40;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d23317\x3bdist\x3d16699;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,4:12:133,12,0

What is '\x3b' in Gene.refGene column?

Please help me out. Thank you!

annovar snp next-gen annotation • 1.9k views
ADD COMMENTlink modified 13 months ago by tannerkoomar40 • written 2.2 years ago by anu014170
2
gravatar for cpad0112
2.2 years ago by
cpad011212k
India
cpad011212k wrote:

x3b is semicolon, File is not formatted well. Hence x3b is not parsed. Had it been parsed well, you would have ; instead of \x3b. x3d in your output is "=". Refer to UTF reference table here

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by cpad011212k

Thank you for the reply. What does 'distx3d' means then ?

ADD REPLYlink written 2.2 years ago by anu014170
1

Refer to above reply

ADD REPLYlink written 2.2 years ago by cpad011212k

Thank you so much. It saved my time!

ADD REPLYlink written 2.2 years ago by anu014170
4
gravatar for tannerkoomar
13 months ago by
tannerkoomar40
tannerkoomar40 wrote:

This is the intended behavior for ANNOVAR

See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g'  myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz
ADD COMMENTlink written 13 months ago by tannerkoomar40

This is correct answer. upvoting...

ADD REPLYlink written 13 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3214 users visited in the last hour