Under what circumstances does "REF | REF" (or REF / REF) show up in the VCF/BCF file
1
1
Entering edit mode
6.4 years ago
rightmirem ▴ 70

Potentially a very stupid question, but it wouldn't be my first ever :)

So looking at the sample VCF files from this Biostars (subset_hg19.vcf), I see a bunch of lines with 0|0 in them.

While I realize this means it has identified that position as matching REF | REF - why would reference matching regions be in the variants file at all? I thought only variants (positions that DID NOT match the REF in one or both alleles) showed up.

When I run bcftools call I DONT get any 0/0or 0|0. Does one have to specifically set the callto identify the matching references - and why would one?

Thanks!

19      416254  rs192385198     T       C       100     PASS    AC=0;AF=0.000599042;AN=12;NS=2504;DP=18252;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416335  rs545360745     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=18554;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416389  rs564036507     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17931;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416406  rs185752424     G       A       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17410;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP      GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416449  rs929834        C       T       100     PASS    AC=0;AF=0.0924521;AN=12;NS=2504;DP=16305;EAS_AF=0;AMR_AF=0.0216;AFR_AF=0.3336;EUR_AF=0.002;SAS_AF=0.0051;AA=C|||;VT=SNP GT      0|0     0|0     0|0     0|0     0|0
 0|0
SNP next-gen software error • 1.6k views
ADD COMMENT
3
Entering edit mode
6.4 years ago

You would have this if you create a subset of a bigger vcf.

Say that your vcf contains 100 individuals, for which one has a 1/0 call. The rest of the samples will have 0/0.

It looks like subset_hg19.vcf is a, hm, subset, of such a bigger vcf ;-)

ADD COMMENT
0
Entering edit mode

Valid point. I wonder why my files have ./.:. instead of "0/0" in the merged file? Maybe because bcftoolsdidnt know if the SNPs were 0/0 or just missing data?

ADD REPLY
2
Entering edit mode

0/0 means "this is twice a reference allele".
./. means "I don't know what this is".

ADD REPLY
2
Entering edit mode

Yes, Wouter is correct for 0/0, whilst ./. means that no genotype could be called at this position. It is a missing value. There may have been no or insufficient reads. I have also explained this in your other question earlier today: Unusual reports from "bcftools stats" (making me question my data)

BCFTools could easily call 0/0 genotypes, but why would it? The VCF file would then grow by a magnitude of thousands or millions because it would be reporting each and every reference base in the VCF.

ADD REPLY
0
Entering edit mode

"BCFTools could easily call 0/0 genotypes, but why would it?" Well, exactly my thought actually :D

All my BCF files have "./.:." for samples that are REF/REF. But this one had "0|0" meaning it's not a variant, but a reference.

So I couldn't figure out why they were there :D

ADD REPLY

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6