Question: Can you assume variants not in VCF are all monomorphic for the reference allele?
1
gravatar for wangdavid758
3 months ago by
wangdavid75810
wangdavid75810 wrote:

I have a VCF file, lets call it file A, which was created by performing variant calling using GATK on whole genome sequencing data. For SNPs that do no appear in the VCF, can I assume that the SNP is monomorphic for the reference allele (i.e all 0 encoding). Note that sites that are monomorphic for the alternate allele appear in the file. I ask because I need to merge this file (A) with another VCF (file B) and I'm not sure how to handle the variants that appear in B but not A. Can I just fill in the genotypes with all 0 for these variants that do no appear in A, or do I have to impute first to account for the possibility of low coverage in the region? Would be great if answers provide a external source as well so I have to point of reference when I consult my advisor because he thinks that sequencing data should never require imputation. Thanks!

ADD COMMENTlink modified 3 months ago by Pierre Lindenbaum108k • written 3 months ago by wangdavid75810
4
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

I wrote a tool to fix this problem : http://lindenb.github.io/jvarkit/FixVcfMissingGenotypes.html

(but it is slow)

After a VCF-merge, read a VCF, look back at some BAMS to tells if the missing genotypes were homozygotes-ref or not-called. If the number of reads is greater than min.depth, then the missing genotype is said hom-ref.

see also: Back-filling missing genotypes in merged VCF How to get sequencing depths from VCF with Rsamtools Call missing variants in VCF as reference allele

ADD COMMENTlink written 3 months ago by Pierre Lindenbaum108k
1

Nice program, Pierre

ADD REPLYlink written 3 months ago by Kevin Blighe21k
1

It's worth noting that assuming hom-ref if coverage exceeds some threshold is a reasonable heuristic in most cases, the real situations can be more subtle. For example, the alignments at a site may have predominantly low MAPQ, or alternatively the alignments may contain enough mismatches that a confident hom-ref call is also inappropriate.

ADD REPLYlink written 3 months ago by Len Trigg1.1k
1

the real situations can be more subtle.

definitely

may have predominantly low MAPQ,

my tool can filter the reads on MAPQ or NM tag

ADD REPLYlink written 3 months ago by Pierre Lindenbaum108k
1
gravatar for JC
3 months ago by
JC6.6k
Mexico
JC6.6k wrote:

Definitely to be sure you need to check for coverage across all regions, if coverage is good enough and variant calling will have no problems, you can assign the position is reference homogenic.

ADD COMMENTlink written 3 months ago by JC6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1106 users visited in the last hour