Question: Merging Vcf Files And Filling In Missing Genotypes
7
gravatar for Allpowerde
8.4 years ago by
Allpowerde1.2k
Allpowerde1.2k wrote:

I have vcf files with genomic variants (e.g. SNPs and Indels) from several patients. I now want to merge them into one vcf file e.g. using vcftools' merge-vcf. Doing that I get a file that holds for the union of all variants observed amongst the merged files the data for each individual in the columns:

#CHROM  POS  ID  REF   ALT     QUAL    FILTER  INFO    FORMAT  Ind1 Ind2 Ind3
chr1    583  .   G     A       8.44    .       <..>    <..>    1/1  1/0  ./.

However, if there was no variant observed for an individual (because it was in agreement with the reference), it won't have information for this particular position and a "./." is written instead. Now, instead of "./." I want to have the actual genotype printed for the specific variant:

#CHROM  POS  ID  REF   ALT     QUAL    FILTER  INFO    FORMAT  Ind1 Ind2 Ind3
chr1    583  .   G     A       8.44    .       <..>    <..>    1/1  1/0  0/1

So all I need to do is go back to the bam file and lookup the genotype of this SNP or Indel.

Is anyone of you aware of a tool that fills in the missing genotypes by looking up the alignment file (e.g. GATK does it for SNPs but not Indels).

vcf indel merge snp • 8.9k views
ADD COMMENTlink modified 8.3 years ago by Jashapiro230 • written 8.4 years ago by Allpowerde1.2k
6
gravatar for Jorge Amigo
8.3 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

correct me if I am wrong, but the VCF format definition states that an allele matching the reference is represented by '0', so if no variant was observed for an individual it means that both alleles would match the reference, hence they should be presented as '0/0' (the '/' stands for unphased genotype).

I guess that the merge-vcf tool does not want to make you think that these positions were typed, and for that reason it uses the './.' code when merging instead of the appropriate '0/0'. it should be then straightforward to replace all './.' by '0/0' by simple plain text pattern replacement.

ADD COMMENTlink written 8.3 years ago by Jorge Amigo11k
4
gravatar for Jashapiro
8.3 years ago by
Jashapiro230
United States
Jashapiro230 wrote:

The problem is that a simple replacement with the reference calls may not be accurate. The SNP caller may have decided that a given call was too low quality/uncertain to report (a het in the example above), but you might want to keep that info if you know there is a SNP at the position from other individuals. mpileup in samtools can call specific bases with the -l or -r options, and it can be used to recall the entire set of individuals, incorporating the information from all of them to decide which variants are most likely to be real.

ADD COMMENTlink written 8.3 years ago by Jashapiro230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1596 users visited in the last hour