Question: Need help understanding read depth for somaticsniper vcf's
1
gravatar for Jordan
3.4 years ago by
Jordan1.0k
Pittsburgh
Jordan1.0k wrote:

Hi,

I'm having trouble understanding read depth in vcf's called by somatic-sniper.

Here is a sample vcf file:

##fileformat=VCFv4.1
##fileDate=20140428
##phasing=none
##reference=human_hg19.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=IGT,Number=1,Type=String,Description="Genotype when called independently (only filled if called in joint prior mode)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##FORMAT=<ID=BCOUNT,Number=4,Type=Integer,Description="Occurrence count for each base at this site (A,C,G,T)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=JGQ,Number=1,Type=Integer,Description="Joint genotype quality (only filled if called in join prior mode)">
##FORMAT=<ID=VAQ,Number=1,Type=Integer,Description="Variant allele quality">
##FORMAT=<ID=BQ,Number=.,Type=Integer,Description="Average base quality">
##FORMAT=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality across all reads">
##FORMAT=<ID=AMQ,Number=.,Type=Integer,Description="Average mapping quality for each allele present in the genotype">
##FORMAT=<ID=SS,Number=1,Type=Integer,Description="Variant status relative to non-adjacent Normal, 0=wildtype,1=germline,2=somatic,3=LOH,4=unknown">
##FORMAT=<ID=SSC,Number=1,Type=Integer,Description="Somatic Score">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Normal    Tumor
chr1    871092    .    G    C    .    .    .    GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC    0/0:0/0:6:5,1,0,0:0,0,6,0:45:.:0:27:38:38:0:.    0/1:0/1:12:8,1,6,0:0,5,9,0:25:.:25:18,23:47:45,46:2:22
chr1    892349    .    G    T    .    .    .    GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC    0/0:0/0:10:9,0,4,0:0,0,9,4:30:.:0:19:36:36:0:.    0/1:0/1:29:24,0,17,0:0,0,24,17:19:.:19:17,10:35:36,33:2:16

The depth of the first variant in tumor is 12. But based on the BCOUNT, it has 15 depth (8+1+6+0). Why this discrepancy? Does anyone know?

And how can I accurately calculate the allele frequency in this case.

For the first variant, the alt allele is C. As per the BCOUNT there is just one 1 C. So, is it 1/15 or 1/12?

It is also a bit concerning that C was called a variant considering that it only has 1 read supporting it, where as A has 6 reads supporting it.

Any help on this is much appreciated. Thanks!

 

ADD COMMENTlink written 3.4 years ago by Jordan1.0k

I'm a bit baffled why this might be happening. I do think it likely this is a bug. The tumor depth should match the DP4 counts. I do see what I think is a bug in how the BCOUNT is calculated. My best guess for DP4 is that there might be some Ns in your reads? Would it be possible to see samtools mpileup or bam-readcount output for these issues?

ADD REPLYlink written 3.4 years ago by ernfrid380

Yes, my first guess was that the reads would have had N's in it as well. But shouldn't the DP be higher than BCOUNT in that case? As BCOUNT gives only ACGTs not N's. Though that does not seem to be the case here.

My reads do have N's in them btw.

ADD REPLYlink written 3.4 years ago by Jordan1.0k

My guess would be 3 reads with Ns for 871092? Is that right?

One reason the BCOUNT seems to make no sense is that you're actually looking at the DP4 counts. It still won't make sense because there are bugs relating to how the counts are generated. Ns are being double counted in all categories and are thus inflating the non-DP counts. I'll try to get a fix out before the end of the week.

ADD REPLYlink written 3.4 years ago by ernfrid380

I looked at this position. So, here are the results:

Total reads at that position: 12
Total N's: 3
Total G's: 6
Total C's: 2
Total T's: 1
Total A's: 0

So, I think the depth is working right. And like you said, there are 3 N's. But the BCOUNT is way off.

The BCOUNT should be: (0, 2, 6, 1).

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Jordan1.0k
1

BCOUNT for your example is (0,5,9,0). There are two bugs going on here.

  1. BCOUNT is being limited to only bases seen in called tumor and normal genotypes.

  2. The limitation in 1 is only being applied in a way that non-ACTG bases can count in multiple categories.

Thus, your T isn't being counted (it isn't in the called genotypes) and your 3 Ns are being added to both your Gs and your Cs.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by ernfrid380

I see. You said you would fix by the end of the week. Can you let me know when u r done? I will probably re-run the tool then on my samples after the update.

 

ADD REPLYlink written 3.4 years ago by Jordan1.0k
1

I realize it's been about a year, which is sad. In the event that you are still interested, the latest version (1.0.5.0) should fix the bug(s) you discovered.

ADD REPLYlink written 2.3 years ago by ernfrid180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour