Vdb Field In Samtools
3
3
Entering edit mode
11.4 years ago

samtools VCF files has a field "VDB" which I believe is "Variant Distance Bias".

Does someone know exactly what this is and how to interpret this field? Can these be both negative and positive? I.e. what does high/low value for VDB mean?

Here are two examples from my VCF file.

DP=64;VDB=0.0398;AF1=1;AC1=4;DP4=0,0,9,48;MQ=44;FQ=-112

DP=447;VDB=0.0419;AF1=1;AC1=4;DP4=0,1,119,288;MQ=47;FQ=-286;PV4=1,0.12,0.32,1

Thanks a bunch!

samtools • 9.6k views
4
Entering edit mode
11.0 years ago
pd3 ▴ 350

VDB (variant distance bias) checks if variant bases occur at random positions in the aligned portion of the reads. It is useful mainly for RNA-seq reads which are aligned against a genomic reference sequence. Higher values indicate higher likelihoods that the variant is distributed within the reads randomly.

1
Entering edit mode
1
Entering edit mode
11.4 years ago

One response at http://seqanswers.com/forums/showthread.php?t=14582 guesses that the VDB field indicates a potential misalignment due to a nearby SNP.

The following from VCF Tools @ Sourceforge,net seems to agree: The "end distance alignment" tests if variant bases tend to occur at a fixed distance from the end of reads, which is usually an indication of misalignment.

0
Entering edit mode

So higher values indicate greater bias and should be flagged as "suspicious"? Or is it the other way around i.e. lower values should be discarded? Thanks a lot!

0
Entering edit mode
10.0 years ago
Marina Manrique ★ 1.3k

A brief description of all the flags present in the INFO and FORMAT fields of the VCF file can be found in the first lines of the file (which start by ##)

##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">