Question: Confusing Definition In Vcfv4.1: Info=<Id=Dp....>
3
gravatar for Lds
6.5 years ago by
Lds380
Lds380 wrote:

Take one VCF file for example,

##fileformat=VCFv4.1
##samtoolsVersion=0.1.16 (r963:234)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
....
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SLVi33.16       SLVi33.25       SLVi33.26
chr1    85691   .       C       T       6.26    .       DP=2;AF1=1;CI95=0.3333,1;DP4=0,0,2,0;MQ=40;FQ=-28.9     GT:PL:GQ        0/1:0,0,0:3     1/1:20,3,0:4    1/1:18,3,0:4
chr1    88338   .       G       A       6.95    .       DP=3;AF1=1;CI95=0.1667,1;DP4=0,0,2,1;MQ=30;FQ=-28.1     GT:PL:GQ        0/1:0,0,0:3     0/1:0,0,0:3     1/1:38,9,0:8

The DP=2 at chr1:85691, however, there are three individuals in the file, why does every individual has genotype? As AF1 means allele frequency of ALT allele, it supposed to be AF1=5/6 at chr1:85691, why it's truely AF1=1? I'm so confused, anybody can help me? Thanks in advance.

vcf • 2.3k views
ADD COMMENTlink modified 7 days ago by Biostar ♦♦ 20 • written 6.5 years ago by Lds380
4
gravatar for Damian Kao
6.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

DP is just the raw read depth at that position. DP of 2 just means 2 reads out of your 3 individual samples mapped to that position and passes the quality filter. It doesn't say whether it is 2 reads from one of your individual or 2 read from 2 individuals..etc.

AF1 of 1 pretty much just means the 2 reads (2/2) both have a T instead of an C at that position.

Why are you expecting it to be 5/6? Are you trying to compare this to some other data?

edit **

Samtools is really not great at outputting and calculating the genotype field. Here is another biostar questions about it.

The PL fields showing 0,0,0 pretty much means there are no reads covering it. It looks like samtools falsely calculates 0,0,0 as heterozygous because the probabilities are similar (all zeroes in this case).

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Damian Kao15k
1

Thanks so much. You've got the point. When I add flag -D in samtools to call the variants, it keep read depth for each individual. What I get for chr1:85691 is,

chr1 85691 . C T 6.26 . DP=2;AF1=1;CI95=0.3333,1;DP4=0,0,2,0;MQ=40;FQ=-28.9 GT:PL:DP:GQ 0/1:0,0,0:0:3 1/1:20,3,0:1:4 1/1:18,3,0:1:4

That's exactly what you said

ADD REPLYlink written 6.5 years ago by Lds380
1

So if I want to extract the genotype from VCF generated by samtools, I must be very careful about the DP, I should mark the genotype as missing data for samples with DP=0.

ADD REPLYlink written 6.5 years ago by Lds380

If DP of 2 means 2 reads out of 3 individuals, for example, if the 2 reads come from SLVi33.16 and SLVi33.25 instead of SLVi33.26, the genotype symbol for SLVi33.26 should be "./.". But in the file, the genotype symbols are 0/1, 1/1 and 1/1, which means the genotype for these 3 individuals is CT, TT and TT(T5+C1), respectively (and that why I said before AF1=5/6). In other word,we know the genotypes for 3 individuals, the DP should be greater than 6, but it's DP=2 here. How to explain this discordance? Thanks

ADD REPLYlink written 6.5 years ago by Lds380

Yeah I see what you mean. I think this maybe an error on samtools part. Samtools is pretty bad at outputting the genotype information. The fact that the PL fields show 0,0,0 pretty much means there is no read covering it.

ADD REPLYlink written 6.5 years ago by Damian Kao15k

Thanks so much. You've got the point. When I add flag -D in samtools to call the variants, it keep read depth for each individual. What I get for chr1:85691 is, chr1 85691 . C T 6.26 . DP=2;AF1=1;CI95=0.3333,1;DP4=0,0,2,0;MQ=40;FQ=-28.9 GT:PL:DP:GQ 0/1:0,0,0:0:3 1/1:20,3,0:1:4 1/1:18,3,0:1:4

That's exactly what you said

ADD REPLYlink written 6.5 years ago by Lds380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 582 users visited in the last hour