How heterozygous GTs were defined in SNP calling vcf result
1
2
Entering edit mode
7.5 years ago
bill-zt ▴ 50

I use samtools-bcftools pipeline to call variations among a sample (30 individuals) with low coverage (~5X each).

samtools mpileup -ugf genome.fa -b bam.list -t AD,INFO/AD,DP,SP  | bcftools call -vcO z -o out.vcf.gz

In the result, I found some strange heterozygous GTs:

(1) some heterozygous GTs has: DP=0, AD=0, SP=0

GT:PL:DP:SP:AD  0/1:0,0,0:0:0:0,0

(2) many heterozygous GTs have very low DPs:

GT:PL:DP:SP:AD 0/1:28,0,61:4:0:3,1

I don't think it is acceptable to deduce a heterozygous GT if the allele number of REF/ALT is only 3 and 1. The main problem is: How do I avoid such heterozygous GTs ? To add some parameters in samtools mpileup / bcftools call ? or use what software to filter them?

WGS SNP samtools bcftools vcf • 2.0k views
ADD COMMENT
0
Entering edit mode

Well, I've used the traditional bcftools call -c model. When I switched to bcftools call -m, the GT:PL:DP:SP:AD 0/1:0,0,0:0:0:0,0 has been disappeared

ADD REPLY
0
Entering edit mode
7.4 years ago
Gabriel R. ★ 2.9k

If you have a coverage of 4, there are two possibilities. Either you have a het site generating a (4 choose 3) or a homozygous site and the one base is an error (with probability that depends on the base quality). In that case, the het site is probably more likely if the base quality is now. The trick is the ratio of the likelihood. So

log(homo ref)-log(het) = 28

based on the PL and the probability of error is 1/630 so it is quite high. You can use the PL values to make sure the desired state is many fold more likely than remaining models.

ADD COMMENT

Login before adding your answer.

Traffic: 3085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6