Question: Under what circumstances does "REF | REF" (or REF / REF) show up in the VCF/BCF file
gravatar for rightmirem
6 days ago by
rightmirem30 wrote:

Potentially a very stupid question, but it wouldn't be my first ever :)

So looking at the sample VCF files from this Biostars (subset_hg19.vcf), I see a bunch of lines with 0|0 in them.

While I realize this means it has identified that position as matching REF | REF - why would reference matching regions be in the variants file at all? I thought only variants (positions that DID NOT match the REF in one or both alleles) showed up.

When I run bcftools call I DONT get any 0/0or 0|0. Does one have to specifically set the callto identify the matching references - and why would one?


19      416254  rs192385198     T       C       100     PASS    AC=0;AF=0.000599042;AN=12;NS=2504;DP=18252;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416335  rs545360745     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=18554;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416389  rs564036507     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17931;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416406  rs185752424     G       A       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17410;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP      GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416449  rs929834        C       T       100     PASS    AC=0;AF=0.0924521;AN=12;NS=2504;DP=16305;EAS_AF=0;AMR_AF=0.0216;AFR_AF=0.3336;EUR_AF=0.002;SAS_AF=0.0051;AA=C|||;VT=SNP GT      0|0     0|0     0|0     0|0     0|0
snp next-gen software error • 143 views
ADD COMMENTlink modified 6 days ago by WouterDeCoster23k • written 6 days ago by rightmirem30
gravatar for WouterDeCoster
6 days ago by
WouterDeCoster23k wrote:

You would have this if you create a subset of a bigger vcf.

Say that your vcf contains 100 individuals, for which one has a 1/0 call. The rest of the samples will have 0/0.

It looks like subset_hg19.vcf is a, hm, subset, of such a bigger vcf ;-)

ADD COMMENTlink written 6 days ago by WouterDeCoster23k

Valid point. I wonder why my files have ./.:. instead of "0/0" in the merged file? Maybe because bcftoolsdidnt know if the SNPs were 0/0 or just missing data?

ADD REPLYlink written 6 days ago by rightmirem30

0/0 means "this is twice a reference allele".
./. means "I don't know what this is".

ADD REPLYlink written 6 days ago by WouterDeCoster23k

Yes, Wouter is correct for 0/0, whilst ./. means that no genotype could be called at this position. It is a missing value. There may have been no or insufficient reads. I have also explained this in your other question earlier today: Unusual reports from "bcftools stats" (making me question my data)

BCFTools could easily call 0/0 genotypes, but why would it? The VCF file would then grow by a magnitude of thousands or millions because it would be reporting each and every reference base in the VCF.

ADD REPLYlink modified 6 days ago • written 6 days ago by Kevin Blighe6.7k

"BCFTools could easily call 0/0 genotypes, but why would it?" Well, exactly my thought actually :D

All my BCF files have "./.:." for samples that are REF/REF. But this one had "0|0" meaning it's not a variant, but a reference.

So I couldn't figure out why they were there :D

ADD REPLYlink written 5 days ago by rightmirem30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1146 users visited in the last hour