ANNOVAR doesn't recognize multiple variants at single locus
6 months ago
j.lunger18 ▴ 30

Hi there,

I'm using ANNOVAR to annotate variants from somatic sequencing data. I first annotate my VCF files with snpEff, which accurately picks up on mutliple variants at a single locus. I then annotate with ANNOVAR, which seemingly is unable to handle the comma separating variants, and only annotates the first in a comma delimited list of variants. Here is an example: I have in order, chromosome, position, REF, ALT, snpEff_Allele, and REVEL score. The REVEL score should be 0.545 for row 1, and 0.698 for row 2 and 3.

chr4    118705648   G   A,C  A      0.545    
chr4    118705648   G   A,C  C      0.545
chr4    118705648   G   C      C      0.698

According to documentation, ANNOVAR should be able to handle this kind of format but seemingly not? This is the code I used to run ANNOVAR.

module load annovar/2018-04-16
perl $ANNOVAR_HOME/ /path/to/file.vcf  $ANNOVAR_DATA/hg38 \
-buildver hg38 -out /path/to/output.vcf \
-protocol exac03nontcga,gnomad_genome,gnomad_exome,esp6500siv2_all,dbnsfp33a,revel,clinvar_20170130,intervar_20180118 -operation f,f,f,f,f,f,f,f \
-nastring . -vcfinput
6 months ago
desouzareis.r ▴ 250


You should decompose and normalize your vcf file before annotation. You can use bcftools or vt. You can find more inoformation here.

bcftools norm -m-both -o ex1.step1.vcf ex1.vcf.gz
bcftools norm -f human_g1k_v37.fasta -o ex1.step2.vcf ex1.step1.vcf

