Question

ANNOVAR zygosity status in case of multiple sample VCF processing

1

Entering edit mode

7.7 years ago

anilkanthi ▴ 10

After using table_annovar.pl for annotating a single sample VCF file, it is possible to get zygosity status [Het or Hom] as well as quality and depth by using the -otherinfo parameter. But when a single VCF file contains sibling, trio or multiple-samples, the result is still the same.

Is there any ANNOVAR or other approach that can be used to get the zygosity, read depth and quality of each variant for each of the samples in the resulting excel file?

Desired output:

In case of trio [Het:Het:Het or Hom:Het:Het or Het:n/a:n/a]

In case of siblings [Het:Het or Hom:Het or Het:/na]

annotation ANNOVAR NGS WES VCF • 3.1k views

ADD COMMENT • link updated 7.7 years ago by Biostar 20 • written 7.7 years ago by anilkanthi ▴ 10

score 0 · Answer 1 · 2018-09-03

You may find this of utility: A: How to get sample names and genotype for SNP in multi-sample VCF file (adapt as needed).

Regarding ANNOVAR, specifically, take a look at the convert2annovar.pl options:

/Programs/annovar/convert2annovar.pl 
Usage:
     convert2annovar.pl [arguments] <variantfile>

     Optional arguments:
            -h, --help                      print help message
            -m, --man                       print complete documentation
            -v, --verbose                   use verbose output
                --format <string>           input format (default: pileup)
                --outfile <file>            output file name (default: STDOUT)
                --snpqual <float>           quality score threshold in pileup file (default: 20)
                --snppvalue <float>         SNP P-value threshold in GFF3-SOLiD file (default: 1)
                --coverage <int>            read coverage threshold in pileup file (default: 0)
                --maxcoverage <int>         maximum coverage threshold (default: none)
                --includeinfo               include supporting information in output
                --chr <string>              specify the chromosome (for CASAVA format)
                --chrmt <string>            chr identifier for mitochondria (default: M)
                --altcov <int>              alternative allele coverage threshold (for pileup format)
                --allelicfrac               print out allelic fraction rather than het/hom status (for pileup format)
                --fraction <float>          minimum allelic fraction to claim a mutation (for pileup format)
                --species <string>          if human, convert chr23/24/25 to X/Y/M (for gff3-solid format)
                --filter <string>           output variants with this filter (case insensitive, for vcf4 format)
                --allsample                 process all samples in file with separate output files (for vcf4 format)
                --withzyg                   print zygosity/coverage/quality when -includeinfo is used (for vcf4 format)
                --genoqual <float>          genotype quality score threshold (for vcf4 format)
                --varqual <float>           variant quality score threshold (for vcf4 format)
                --comment                   keep comment line in output (for vcf4 format)
                --dbsnpfile <file>          dbSNP file in UCSC format (for rsid format)
                --withfreq                  for --allsample, print frequency information instead (for vcf4 format)
                --seqdir <string>           directory with FASTA sequences (for region format)
                --inssize <int>             insertion size (for region format)
                --delsize <int>             deletion size (for region format)
                --subsize <int>             substitution size (default: 1, for region format)
                --context <int>             print context nucleotide for indels (for casava format)

     Function: convert variant call file generated from various software programs 
     into ANNOVAR input format

     Example: convert2annovar.pl -format pileup -outfile variant.query variant.pileup
              convert2annovar.pl -format cg -outfile variant.query variant.cg
              convert2annovar.pl -format cgmastervar variant.masterVar.txt
              convert2annovar.pl -format gff3-solid -outfile variant.query variant.snp.gff
              convert2annovar.pl -format soap variant.snp > variant.avinput
              convert2annovar.pl -format maq variant.snp > variant.avinput
              convert2annovar.pl -format casava -chr 1 variant.snp > variant.avinput
              convert2annovar.pl -format vcf4 variantfile > variant.avinput
              convert2annovar.pl -format vcf4 -filter pass variantfile -allsample -outfile variant
              convert2annovar.pl -format vcf4old input.vcf > output.avinput
              convert2annovar.pl -format rsid snplist.txt -dbsnpfile snp138.txt > output.avinput
              convert2annovar.pl -format region -seqdir humandb/hg19_seq/ chr1:2000001-2000003 -inssize 1 -delsize 2
              convert2annovar.pl -format transcript NM_022162 -gene humandb/hg19_refGene.txt -seqdir humandb/hg19_seq/

     Version: $Date: 2015-06-17 21:43:51 -0700 (Wed, 17 Jun 2015) $

Kevin