Question: Vdb Field In Samtools
3
gravatar for learnerforever
7.9 years ago by
learnerforever520 wrote:

samtools VCF files has a field "VDB" which I believe is "Variant Distance Bias".

Does someone know exactly what this is and how to interpret this field? Can these be both negative and positive? I.e. what does high/low value for VDB mean?

Here are two examples from my VCF file.

DP=64;VDB=0.0398;AF1=1;AC1=4;DP4=0,0,9,48;MQ=44;FQ=-112

DP=447;VDB=0.0419;AF1=1;AC1=4;DP4=0,1,119,288;MQ=47;FQ=-286;PV4=1,0.12,0.32,1

Thanks a bunch!

samtools • 7.1k views
ADD COMMENTlink modified 5.5 years ago by Biostar ♦♦ 20 • written 7.9 years ago by learnerforever520
5
gravatar for pd3
7.5 years ago by
pd3340
pd3340 wrote:

VDB (variant distance bias) checks if variant bases occur at random positions in the aligned portion of the reads. It is useful mainly for RNA-seq reads which are aligned against a genomic reference sequence. Higher values indicate higher likelihoods that the variant is distributed within the reads randomly.

ADD COMMENTlink written 7.5 years ago by pd3340
1

http://sourceforge.net/mailarchive/message.php?msg_id=28831076

ADD REPLYlink written 7.1 years ago by Galaxy10
1
gravatar for Larry_Parnell
7.9 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

One response at http://seqanswers.com/forums/showthread.php?t=14582 guesses that the VDB field indicates a potential misalignment due to a nearby SNP.

The following from VCF Tools @ Sourceforge,net seems to agree: The "end distance alignment" tests if variant bases tend to occur at a fixed distance from the end of reads, which is usually an indication of misalignment.

ADD COMMENTlink written 7.9 years ago by Larry_Parnell16k

Thansk for the links.

So higher values indicate greater bias and should be flagged as "suspicious"? Or is it the other way around i.e. lower values should be discarded? Thanks a lot!

ADD REPLYlink written 7.9 years ago by learnerforever520
0
gravatar for Marina Manrique
6.5 years ago by
Marina Manrique1.3k
Granada
Marina Manrique1.3k wrote:

A brief description of all the flags present in the INFO and FORMAT fields of the VCF file can be found in the first lines of the file (which start by ##)

##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
ADD COMMENTlink written 6.5 years ago by Marina Manrique1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour