Entering edit mode
10.3 years ago
Jordan
★
1.3k
Hi,
I'm having trouble understanding read depth in vcf's called by somatic-sniper
.
Here is a sample vcf file:
##fileformat=VCFv4.1
##fileDate=20140428
##phasing=none
##reference=human_hg19.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=IGT,Number=1,Type=String,Description="Genotype when called independently (only filled if called in joint prior mode)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##FORMAT=<ID=BCOUNT,Number=4,Type=Integer,Description="Occurrence count for each base at this site (A,C,G,T)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=JGQ,Number=1,Type=Integer,Description="Joint genotype quality (only filled if called in join prior mode)">
##FORMAT=<ID=VAQ,Number=1,Type=Integer,Description="Variant allele quality">
##FORMAT=<ID=BQ,Number=.,Type=Integer,Description="Average base quality">
##FORMAT=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality across all reads">
##FORMAT=<ID=AMQ,Number=.,Type=Integer,Description="Average mapping quality for each allele present in the genotype">
##FORMAT=<ID=SS,Number=1,Type=Integer,Description="Variant status relative to non-adjacent Normal, 0=wildtype,1=germline,2=somatic,3=LOH,4=unknown">
##FORMAT=<ID=SSC,Number=1,Type=Integer,Description="Somatic Score">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Normal Tumor
chr1 871092 . G C . . . GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC 0/0:0/0:6:5,1,0,0:0,0,6,0:45:.:0:27:38:38:0:. 0/1:0/1:12:8,1,6,0:0,5,9,0:25:.:25:18,23:47:45,46:2:22
chr1 892349 . G T . . . GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC 0/0:0/0:10:9,0,4,0:0,0,9,4:30:.:0:19:36:36:0:. 0/1:0/1:29:24,0,17,0:0,0,24,17:19:.:19:17,10:35:36,33:2:16
The depth of the first variant in tumor is 12. But based on the BCOUNT
, it has 15 depth (8+1+6+0). Why this discrepancy? Does anyone know?
And how can I accurately calculate the allele frequency
in this case.
For the first variant, the alt allele is C. As per the BCOUNT there is just one 1 C. So, is it 1/15 or 1/12?
It is also a bit concerning that C was called a variant considering that it only has 1 read supporting it, where as A has 6 reads supporting it.
Any help on this is much appreciated. Thanks!
I'm a bit baffled why this might be happening. I do think it likely this is a bug. The tumor depth should match the
DP4
counts. I do see what I think is a bug in how theBCOUNT
is calculated. My best guess forDP4
is that there might be some Ns in your reads? Would it be possible to seesamtools mpileup
orbam-readcount
output for these issues?Yes, my first guess was that the reads would have had
N
s in it as well. But shouldn't theDP
be higher thanBCOUNT
in that case? AsBCOUNT
gives only ACGTs notN
s. Though that does not seem to be the case here.My reads do have
N
s in them btw.My guess would be 3 reads with Ns for
871092
? Is that right?One reason the
BCOUNT
seems to make no sense is that you're actually looking at theDP4
counts. It still won't make sense because there are bugs relating to how the counts are generated. Ns are being double counted in all categories and are thus inflating the non-DP
counts. I'll try to get a fix out before the end of the week.I looked at this position. So, here are the results:
So, I think the depth is working right. And like you said, there are 3
N
s. But theBCOUNT
is way off.The
BCOUNT
should be:(0, 2, 6, 1)
.BCOUNT
for your example is(0,5,9,0)
. There are two bugs going on here.BCOUNT
is being limited to only bases seen in called tumor and normal genotypes.Thus, your
T
isn't being counted (it isn't in the called genotypes) and your 3N
s are being added to both yourG
s and yourC
s.I see. You said you would fix by the end of the week. Can you let me know when u r done? I will probably re-run the tool then on my samples after the update.
I realize it's been about a year, which is sad. In the event that you are still interested, the latest version (1.0.5.0) should fix the bug(s) you discovered.