Question: How to differentiate Homozygous reference calls from No calls in VCF
0
gravatar for ramesh.8v
2.7 years ago by
ramesh.8v200
United States
ramesh.8v200 wrote:

I'm using samtools multisample SNP calling to call SNPs from different samples. The command I'm using is,

samtools mpileup -uf ref.fa *.bam | bcftools view -vcg - > out.vcf

Then, I'm using vcf-to-tab tool to convert vcf to tab file.

The problem is, No calls (missed in sequencing) are called as homozygous reference calls. How to differentiate No calls from homozygous reference calls.

I appreciate any sort of help.

Edit: My vcf file looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  samp1.bam   samp2.bam 000000F   634700  .   T   C   3.55    .   DP=1;AF1=1;AC1=4;DP4=0,0,0,1;MQ=60;FQ=-27.4 GT:PL:GQ    0/1:31,3,0:5    0/1:0,0,0:3
samtools snps • 1.8k views
ADD COMMENTlink modified 2.7 years ago by genomax71k • written 2.7 years ago by ramesh.8v200

Generally, you'd do that by depth. If the depth for that sample is below your criteria for safely considering something homozygous-ref at a specific position, then consider it a no-call.

ADD REPLYlink written 2.7 years ago by Brian Bushnell16k

Thanks, Brain. That's a very good idea. But, raw read depth is not printed for individual samples, how to force bcftools to print raw read depth?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by ramesh.8v200

Ah, that's kind of unfortunate. I wonder if there's an option for that somewhere? I'm not all that familiar with bcftools, but for some reason depth and allele fraction do not seem to be default fields for each sample. Maybe you're supposed to guess from the PL field ("phred-scaled genotype likelihoods"); since they are all zero for the second sample, the implication is that the depth is zero, even though it was called 0/1 (het). Well, that doesn't make much sense!

One alternative is to use a different tool to calculate coverage at that location for that sample. Bedtools probably allows that, and BBMap's pileup.sh does as well. Not a convenient option, though, for sure.

ADD REPLYlink written 2.7 years ago by Brian Bushnell16k
1
gravatar for d-cameron
2.7 years ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

No calls (missed in sequencing) are called as homozygous reference calls. How to differentiate No calls from homozygous reference calls.

According to the bcftools documentation:

-g, --gvcf INT  output also gVCF blocks of homozygous REF calls. The parameter INT is the minimum per-sample depth required to include a site in the non-variant block.

Looking at your command line, I notice that you haven't specified a minimum per-sample depth for the -g parameter. Try setting that parameter to a sane, non-zero value and you should be able to differentiate between no-call and homozygous REF.

ADD COMMENTlink written 2.7 years ago by d-cameron2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 540 users visited in the last hour