I have two VCF files from different individuals. They are both mapped and have a number of non-overlapping positions. Is there a way to filter then so only the overlapping POS remain in the file?
File 1:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1
NW_014726.1 404 . A G 225 . DP=15;VDB=0.310396;SGB=-0.686358;MQSB=0.640197;MQ0F=0;AC=2;AN=2;DP4=0,0,7,7;MQ=43 GT:PL 1/1:255,42,0
NW_014726.1 495 . T C 225 . DP=16;VDB=0.351783;SGB=-0.689466;MQSB=0.948436;MQ0F=0;AC=2;AN=2;DP4=0,0,6,10;MQ=39 GT:PL 1/1:255,48,0
NW_014726.1 542 . T G 225 . DP=16;VDB=0.909865;SGB=-0.688148;MQSB=0.959011;MQ0F=0;AC=2;AN=2;DP4=0,0,8,7;MQ=41 GT:PL 1/1:255,45,0
NW_014726.1 722 . G A 225 . DP=24;VDB=0.680279;SGB=-0.692352;MQSB=0.769328;MQ0F=0;AC=2;AN=2;DP4=0,0,8,13;MQ=40 GT:PL 1/1:255,63,0
NW_014726.1 750 . G C 225 . DP=22;VDB=0.501443;SGB=-0.692352;MQSB=0.95645;MQ0F=0;AC=2;AN=2;DP4=0,0,7,14;MQ=39 GT:PL 1/1:255,63,0
File 2:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT M2
NW_014726.1 404 . A G 225 . DP=14;VDB=0.237864;SGB=-0.683931;MQSB=0.679965;MQ0F=0;AC=2;AN=2;DP4=0,0,5,8;MQ=43 GT:PL 1/1:255,39,0
NW_014726.1 495 . T C 225 . DP=15;VDB=0.67567;SGB=-0.686358;MQSB=0.939413;MQ0F=0;AC=2;AN=2;DP4=0,0,4,10;MQ=39 GT:PL 1/1:255,42,0
NW_014726.1 530 . T G 225 . DP=16;VDB=0.650373;SGB=-0.686358;MQSB=0.411112;MQ0F=0;AC=2;AN=2;DP4=0,0,9,5;MQ=42 GT:PL 1/1:255,42,0
NW_014726.1 745 . G A 225 . DP=18;VDB=0.178812;SGB=-0.690438;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,6,11;MQ=41 GT:PL 1/1:255,51,0
NW_014726.1 750 . G C 206 . DP=17;VDB=0.787081;SGB=-0.680642;RPB=0.893597;MQB=0.031048;MQSB=1;BQB=0.965874;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=2,3,4,8;MQ=41 GT:PL 0/1:239,0,115
The output I am looking for would remove the POS that are not in common and would look like this:
File 1:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1
NW_014726.1 404 . A G 225 . DP=15;VDB=0.310396;SGB=-0.686358;MQSB=0.640197;MQ0F=0;AC=2;AN=2;DP4=0,0,7,7;MQ=43 GT:PL 1/1:255,42,0
NW_014726.1 495 . T C 225 . DP=16;VDB=0.351783;SGB=-0.689466;MQSB=0.948436;MQ0F=0;AC=2;AN=2;DP4=0,0,6,10;MQ=39 GT:PL 1/1:255,48,0
NW_014726.1 750 . G C 225 . DP=22;VDB=0.501443;SGB=-0.692352;MQSB=0.95645;MQ0F=0;AC=2;AN=2;DP4=0,0,7,14;MQ=39 GT:PL 1/1:255,63,0
File 2:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT M2
NW_014726.1 404 . A G 225 . DP=14;VDB=0.237864;SGB=-0.683931;MQSB=0.679965;MQ0F=0;AC=2;AN=2;DP4=0,0,5,8;MQ=43 GT:PL 1/1:255,39,0
NW_014726.1 495 . T C 225 . DP=15;VDB=0.67567;SGB=-0.686358;MQSB=0.939413;MQ0F=0;AC=2;AN=2;DP4=0,0,4,10;MQ=39 GT:PL 1/1:255,42,0
NW_014726.1 750 . G C 206 . DP=17;VDB=0.787081;SGB=-0.680642;RPB=0.893597;MQB=0.031048;MQSB=1;BQB=0.965874;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=2,3,4,8;MQ=41 GT:PL 0/1:239,0,115
I tried using
bcftools merge
with the hope of just filtering a single file, but i keep running into problems, so no I am looking for a way to filter the individuals files based on the positions in the other.
Looking for any solution in R, BCFtools/VCFtools, or whatever combination of awk, grep, ... works.
use
bcftools isec