Dear all,
Currently I have two large VCF files that include calls that are homozygous for the reference allele. For easier analysis, I would like to remove the variants that are homozygous-ref in both VCF files (or, in VCF-speak, be "0/0" for both samples at the same locus). I can't be the first to want to do this, but wasn't able to find anything of use.
INPUT
sample1.vcf
20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS
sample2.vcf
20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60056   .       G       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=26;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:26:26:99:PASS
to:
OUTPUT
sample1.vcf
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS
sample2.vcf
20      60056   .       G       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60058   .       C      T       35      PASS    DP=26;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:26:26:99:PASS
Some notes:
- The files are around 260 GB big
 - I would like to keep the files seperate (not joining together)
 - The is about 128GB memory available
 - The files are sorted on position (fortunately)
 
Does anyone have experience with something like this, or could point me into a useful direction? Many thanks.
The reason for not wanting to merge are the fact that 1) bcftools merge seems to output a file that tabix can not index anymore (maybe because of the size?), and 2) the script for the next analysis step already being ready, taking single-sample VCF's as input.
I ended up splitting the files by chromosome with tabix (this turned out to be necessary anyway) and doing a temporary merge using GNU join.
Your answer does the job though (and is the most logical approach is almost all cases) , so therefore accepted as answer, thanks!