Non Variant Sites in GVCF File
2
0
Entering edit mode
8 months ago
saamhasan55 ▴ 10

Hi, I have a gvcf file produced from GATK. A lot of the sites in the vcf file have "NON_REF" in the alt allele column. It is a multi-sample joint genotyped vcf, so I can see that at some of the sites with NON_REF for alt allele, some of them samples have a 0/0 called genotype. I wanted to know what do this NON_REF or non variant sites actually refer to, are these sites which are homozygous for the ref allele? And if so then why are they in a vcf file?

Cheers

GATK VCF SNP • 1.6k views
ADD COMMENT
1
Entering edit mode
8 months ago

The NON_REF is a placeholder term in gVCFs that means there is coverage in that region. If <NON_REF> is alone the block is reference with some possibility of variation.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA12878
20  10001567    .   A   <NON_REF>   .   .   END=10001616    GT:DP:GQ:MIN_DP:PL  0/0:38:99:34:0,101,1114
20  10001617    .   C   A,<NON_REF> 493.77  .   BaseQRankSum=1.632;ClippingRankSum=0.000;DP=38;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQ=136800.00;ReadPosRankSum=0.170    GT:AD:DP:GQ:PL:SB   0/1:19,19,0:38:99:522,0,480,578,538,1116:11,8,13,6
20  10001618    .   T   <NON_REF>   .   .   END=10001627    GT:DP:GQ:MIN_DP:PL  0/0:39:99:37:0,105,1575
20  10001628    .   G   A,<NON_REF> 1223.77 .   DP=37;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=133200.00   GT:AD:DP:GQ:PL:SB   1/1:0,37,0:37:99:1252,111,0,1252,111,1252:0,0,21,16
20  10001629    .   G   <NON_REF>   .   .   END=10001660    GT:DP:GQ:MIN_DP:PL  0/0:43:99:38:0,102,1219
ADD COMMENT
0
Entering edit mode

Thanks Jeremy, I am clearer on this now. One more question, does this mean all the sites where the reads from my sequenced genome mapped to the reference but the alleles they contained were the same as the ref are outputted to the gVCF as NON_REF?

ADD REPLY
1
Entering edit mode

yes the <NON_REF> by itself means a span of homozygous ref

ADD REPLY
0
Entering edit mode
8 months ago

you're looking at a GVCF file, not a "VCF" file.

https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format

True variants must be called with gatk GenotypeGVCF

ADD COMMENT
0
Entering edit mode

Hi Pierre, thank you for the input. Yes you are right, I am looking at a GVCF. But I just wanted to know what is the exact definition of a non-variant site. The GATK documentation page only has a one line mention where it says non variant sites represent the possibility of there being an alt allele at that position.

ADD REPLY

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6