Question: Under what circumstances does "REF | REF" (or REF / REF) show up in the VCF/BCF file
1
gravatar for rightmirem
6 days ago by
rightmirem30
rightmirem30 wrote:

Potentially a very stupid question, but it wouldn't be my first ever :)

So looking at the sample VCF files from this Biostars (subset_hg19.vcf), I see a bunch of lines with 0|0 in them.

While I realize this means it has identified that position as matching REF | REF - why would reference matching regions be in the variants file at all? I thought only variants (positions that DID NOT match the REF in one or both alleles) showed up.

When I run bcftools call I DONT get any 0/0or 0|0. Does one have to specifically set the callto identify the matching references - and why would one?

Thanks!

19      416254  rs192385198     T       C       100     PASS    AC=0;AF=0.000599042;AN=12;NS=2504;DP=18252;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416335  rs545360745     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=18554;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416389  rs564036507     G       C       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17931;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP     GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416406  rs185752424     G       A       100     PASS    AC=0;AF=0.000199681;AN=12;NS=2504;DP=17410;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP      GT      0|0     0|0     0|0     0|0     0|0     0|0
19      416449  rs929834        C       T       100     PASS    AC=0;AF=0.0924521;AN=12;NS=2504;DP=16305;EAS_AF=0;AMR_AF=0.0216;AFR_AF=0.3336;EUR_AF=0.002;SAS_AF=0.0051;AA=C|||;VT=SNP GT      0|0     0|0     0|0     0|0     0|0
 0|0
snp next-gen software error • 143 views
ADD COMMENTlink modified 6 days ago by WouterDeCoster23k • written 6 days ago by rightmirem30
3
gravatar for WouterDeCoster
6 days ago by
Belgium
WouterDeCoster23k wrote:

You would have this if you create a subset of a bigger vcf.

Say that your vcf contains 100 individuals, for which one has a 1/0 call. The rest of the samples will have 0/0.

It looks like subset_hg19.vcf is a, hm, subset, of such a bigger vcf ;-)

ADD COMMENTlink written 6 days ago by WouterDeCoster23k

Valid point. I wonder why my files have ./.:. instead of "0/0" in the merged file? Maybe because bcftoolsdidnt know if the SNPs were 0/0 or just missing data?

ADD REPLYlink written 6 days ago by rightmirem30
2

0/0 means "this is twice a reference allele".
./. means "I don't know what this is".

ADD REPLYlink written 6 days ago by WouterDeCoster23k
2

Yes, Wouter is correct for 0/0, whilst ./. means that no genotype could be called at this position. It is a missing value. There may have been no or insufficient reads. I have also explained this in your other question earlier today: Unusual reports from "bcftools stats" (making me question my data)

BCFTools could easily call 0/0 genotypes, but why would it? The VCF file would then grow by a magnitude of thousands or millions because it would be reporting each and every reference base in the VCF.

ADD REPLYlink modified 6 days ago • written 6 days ago by Kevin Blighe6.7k

"BCFTools could easily call 0/0 genotypes, but why would it?" Well, exactly my thought actually :D

All my BCF files have "./.:." for samples that are REF/REF. But this one had "0|0" meaning it's not a variant, but a reference.

So I couldn't figure out why they were there :D

ADD REPLYlink written 5 days ago by rightmirem30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1146 users visited in the last hour