Hi everyone! I'm carrying out variant calling using bcftools
. First, I created .mpileup file using bcftools mpileup
, then run bcftools calls
for variant calling. In some positions .mpileup file contains information about both alleles of diploid heterozygous (e.g. A,T while reference only T), but when I run bcftools calls
this information erased and only one allele remained.
Also, I used IGV-browser, it showed that it is heterozygous, but final .vcf file does not contain both alleles, only one of them. See the attachments. Who knows why it is happening, I tried to use most arguments of bcftools call, but the result is the same. By the way, it only happened in a bunch of positions, most positions worked fine as expected.
Picture 1: Interested position circled in blue
Picture 2: dac.calls.vcf contains only G, but must contain both A and G
you want to have a VCF with ALT alleles A and T ? but T looks like a small sequencing error to me....
no, I want to have A and G (gray, not highlighted). REF is G and here I want to have both A, G.
I still don't understand. The G is the REF allele. That's the 4th column of your VCF. It cannot be in the ALT column.
This is a weird task, I want to write both A and G in this column, and bcftools allows you to do this, just not to add -v (variant-only) flag. In most cases it worked fine, but some cases like this don't show both alleles in ALT column. May be something in INFO column, I can not understand
you can't have G in the ALT column if G is the REF allele. Your problem sounds like a XY problem: https://xyproblem.info/
Ok, thank you! Now I see I was mistaken, I thought about it because it is even called VARIANT calling, but G here is not a variant. May be you know how to collect both G and A? I think it is not a variant calling at all, but I think manipulating .mpileup we can get it?
You need to parse the VCF file and do the combination yourself. Pay attention to the REF and ALT (not "ALT" not "VARIANT"!) columns, along with Genotype GT.
Here we'll have REF=G, ALT=A (alternate allele) and GT=0/1 indicating one allele is REF and the other is ALT. If it was homozygous it'd be GT=1/1. If we had REF=G and our alingment showing A/T as the two alleles then ALT=A,T and GT=1/2.
While it's maybe not the best format on the planet, to put it mildly, VCF is a standard and bcftools is simply following that.