Annovar result explanation
2
0
Entering edit mode
6.7 years ago
anu014 ▴ 190

Hello Biostars,

I was trying to annotate the VCF using ANNOVAR. I used "table_annovar.pl" script to get these 2 files : myanno.hg19_multianno.txt & myanno.hg19_multianno.vcf. Firstly, the number of rows are differing in both (after removing headers from both file). What is the reason behind this? Secondly, ANNOVAR VCF file Looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    779788  .       C       A       21.77   PASS    AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;QD=10.88;SOR=0.693;VQSLOD=11.18;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=ncRNA_intronic;Gene.refGene=LINC01128;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END    GT:AD:GQ:PL     1/1:0,2:6:49,6,0
chr1    834928  rs4422949       A       G       70.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=23.43;SOR=2.833;VQSLOD=6.71;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22746\x3bdist\x3d17270;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:98,9,0
chr1    834999  rs28570054      G       A       82.28   PASS    AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=27.43;SOR=2.833;VQSLOD=6.75;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d22817\x3bdist\x3d17199;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,3:9:110,9,0
chr1    835499  rs4422948       A       G       105.03  PASS    AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=44.00;POSITIVE_TRAIN_SITE;QD=26.26;SOR=0.693;VQSLOD=7.40;culprit=MQ;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=FAM41C\x3bLOC100130417;GeneDetail.refGene=dist\x3d23317\x3bdist\x3d16699;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END        GT:AD:GQ:PL     1/1:0,4:12:133,12,0

What is '\x3b' in Gene.refGene column?

Please help me out. Thank you!

annotation SNP next-gen ANNOVAR • 5.6k views
ADD COMMENT
2
Entering edit mode
6.7 years ago

x3b is semicolon, File is not formatted well. Hence x3b is not parsed. Had it been parsed well, you would have ; instead of \x3b. x3d in your output is "=". Refer to UTF reference table here

ADD COMMENT
0
Entering edit mode

Thank you for the reply. What does 'distx3d' means then ?

ADD REPLY
1
Entering edit mode

Refer to above reply

ADD REPLY
0
Entering edit mode

Thank you so much. It saved my time!

ADD REPLY
5
Entering edit mode
5.7 years ago
tannerkoomar ▴ 50

This is the intended behavior for ANNOVAR

See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g'  myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz
ADD COMMENT
0
Entering edit mode

This is correct answer. upvoting...

ADD REPLY

Login before adding your answer.

Traffic: 1319 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6