Wrong SNP calling in my bcftools output
0
2
Entering edit mode
6 months ago
pablo ▴ 350

Hi,

I have 384 samples I want to genotype with the GBS technology. This is the command I use :

sbatch -p gdec -c 128 --mem=128G --wrap='bcftools mpileup \
 --threads 128 \
 -q 10 -Q 20 \ 
 --skip-indels \ 
 --max-depth 500 \
 -f reference.fasta \
 *.bam \
 --annotate FORMAT/AD,FORMAT/DP,INFO/AD | \
   bcftools call \
   --threads 128 \
   -mv \
   -Oz -o calls.denovo.vcf.gz'

I had a look on some genotypes, that I verified in the corresponding BAM files. Here are the alignments for a single sample at a specific position, here are four SNPs. I sorted the alignment by MAPQ in IGV (in red is the MAPQ threshold at 10) , and the depth at this locus is around 580X (I can't show all the screen)

gbs

It seems logical, as I applied the -q 10 filter, only the first and last variants are detected. The first variant is heterozygous in reality (but considered as homozygous with the >MAPQ10 reads). The two middle variants are only supported by <MAPQ10 reads, and then, it appears as reference here (although some reads carry the snp).

I did a new calling only with -q 1 and --max-depth 1000 to consider all the reads, and then, to detect the four variants. Here is my problem : the two middle snps are still not detected, the last one is fine (1/1), but the first one is :

    h2tg000001l;385471;A;G;1/1:79,6,0:577:277,300

The AD is "277,300" which is heterozygous. Why do I get a 1/1 genotype? Also, why the two middle variants are not detected whereas they are 0/1 if I consider all the reads (-q 1)?

Do I miss something? Do I do something wrong? I get a bit lost..

Best

snp bcftools vcf vcftools • 1.0k views
ADD COMMENT
0
Entering edit mode

Why you are lowering the quality score to force the result you expect? Never eye-ball alignments, the variant callers use a bunch of different statistics and heuristics to call variants.

Are you expecting four variants because its an experimentally defined mixture? I think if you are trying to fudge the data to get what you expect, you should change these parameters in the mpileup command:

 -q 10 -Q 20 \  
 --max-depth 500 \

set the max depth to 10,000 and -q and -Q to zero.

I don't suggest doing that, because it is the equivalent of P-hacking your variants.

ADD REPLY
0
Entering edit mode

I usually use those parameters -q 10 -Q 20 when I run bcftools mpileup , to get significant variants. I just try to trick the parameters to understand the situation.I did what you (don't) suggest but the results are the same, but I got exactly the same results.

The first variant appears as 1/1 wheareas I should get a 0/1 genotype (and I still have the two not detected variants). One of the main problem is that the AD for the 1/1 genotype is 282,306 which is heterozygous, right? I should get a 0/1 genotype

ADD REPLY
0
Entering edit mode

Any suggestion?

ADD REPLY
0
Entering edit mode

I have no idea why you can not force the output. But at the same time, its (IMO) a fools errand (no offense) and a waste of time.

Maybe try not using -v, --variants-only flag?

ADD REPLY
0
Entering edit mode

I do not have idea too actually. I just ask because there's a difference between what I get from bcftools and the IGV output. Which means bcftools gives wrong output? I already tried without the -v flag. The main question is : why GT:PL:DP:AD 1/1:79,6,0:577:277,300 outputs as 1/1 ? And not as 0/1 ?

ADD REPLY

Login before adding your answer.

Traffic: 3441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6