Question: INFO interpretation in VCF File
0
gravatar for cvu
4.8 years ago by
cvu130
India
cvu130 wrote:

I've generated vcf file using samtools mpileup. i undertood all the fields in vcf file, except INFO column.

DP=55;VDB=2.813188e-01;AF1=1;AC1=2;DP4=0,0,15,15;MQ=44;FQ=-117

from this i can interpret DP, MQ and FQ, but what are VDB, AF1, AC1, Dp4?

can anyone explain me ?

ADD COMMENTlink modified 4.8 years ago by Ashutosh Pandey11k • written 4.8 years ago by cvu130
2

I've changed the category of this post from 'Blog' to 'Question', because it is actually a question and not a blog post. Have a look at the VCF format specifications to have an idea on how to interpret the INFO field. You should also look at how these elements are defined in the header of the VCF file itself.

ADD REPLYlink written 4.8 years ago by Giovanni M Dall'Olio26k

sorry to ask basic

please explain me following things

AF1: Max-likelihood estimate of the first ALT allele frequency (assuming HWE)

DP4=high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases

VDB=Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data

PV4=P-values for strand bias, baseQ bias, mapQ bias and tail distance bias

ADD REPLYlink written 4.8 years ago by cvu130

Which parts of those don't you understand?

ADD REPLYlink written 4.8 years ago by Devon Ryan89k

How samtools count values for AF1 Ac1 VDB PV4 DP4?

ADD REPLYlink written 4.8 years ago by cvu130
1

By "count" do you mean "calculate"? Some of this is described in a PDF from Heng Li. VDB is described schematically in this PDF. You should be able to find anything else with google.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Devon Ryan89k

i want know how significant they are when finding SNPs in human for clinical purpose ?

ADD REPLYlink written 4.8 years ago by cvu130

Well, it's probably a good idea to perform some filtering according to them and check if any SNPs of particular interest show any particular bias. That's basically the point of these various metrics.

ADD REPLYlink written 4.8 years ago by Devon Ryan89k

thanks Devon Ryan !!

ADD REPLYlink written 4.8 years ago by cvu130
1
gravatar for Ashutosh Pandey
4.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Though the question has already been answered but I would just paste the header of the vcf file that samtools + bcftools generates as it might be useful.

 

##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##INFO=<ID=PASS,Number=0,Type=Flag,Description="Variants that passed filtering (step1) at Samtool level">
##INFO=<ID=COMMON_PASS_FAIL,Number=0,Type=Flag,Description="Variants that passed filtering at Samtool level (step1) but failed at GATK level (step2)">
##INFO=<ID=COMMON_PASS_PASS,Number=0,Type=Flag,Description="Variants that passed filtering at Samtool level (step1)and GATK level (step2)">
##INFO=<ID=DUAL_VARIANT_PASS_FAIL,Number=0,Type=Flag,Description="Position called as both SNP and Indel and post-filtering eliminated one effect based on some criteria">
##INFO=<ID=DUAL_VARIANT_PASS_PASS,Number=0,Type=Flag,Description="Position called as both SNP and Indel and post-filtering approved both of them. Must be used with caution.">

ADD COMMENTlink written 4.8 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour