I have read this post: How bcftools isec works ? to further understand the output generated by bcftools isec.
I'm trying to find variants that are common by location in at least 2 files:
##bcftools_isecCommand=isec -p isec_out -n +2 UNC2FT198_Imtechella_halotolerans.fltq.vcf.gz UNC2FT2114_Imtechella_halotolerans.fltq.vcf.gz UNC2FT4154_Imtechella_halotolerans.fltq.vcf.gz UNC2FT55199_Imtechella_halotolerans.fltq.vcf.gz UNC2FT7534A_Imtechella_halotolerans.fltq.vcf.gz UNC2lu13_Imtechella_halotolerans.fltq.vcf.gz; Date=Wed Jun 23 15:57:36 2021
After generating the intersections of multiple vcf files I noticed in the last file containing the intersections only one sample name is present. How can I know from which samples the variants are common for?
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  UNC2lu13_vs_combined.sorted.bam
Imtechella_halotolerans_length_3113269  169317  .   TG  TGTAAG  148 PASS    INDEL;IDV=10;IMF=0.625;DP=16;VDB=0.999554;SGB=-0.670168;MQSB=0.0871515;MQ0F=0;AC=1;AN=1;DP4=1,5,10,0;MQ=24  GT:PL   1:175,0
Imtechella_halotolerans_length_3113269  1450920 .   G   T   141 PASS    DP=46;VDB=0.0516381;SGB=-0.692562;RPB=0.369907;MQB=0.00470369;MQSB=0.000385694;BQB=0.917285;MQ0F=0;AC=1;AN=1;DP4=17,3,7,15;MQ=12    GT:PL   1:168,0
Imtechella_halotolerans_length_3113269  1455993 .   A   T   195 PASS    DP=30;VDB=0.63518;SGB=-0.693021;RPB=1;MQB=1;MQSB=0.0374987;BQB=1;MQ0F=0;AC=1;AN=1;DP4=0,1,25,2;MQ=28    GT:PL   1:222,0
Imtechella_halotolerans_length_3113269  1633169 .   T   C   60  PASS    DP=34;VDB=0.0688269;SGB=-0.693127;MQSB=0.346427;MQ0F=0;AC=1;AN=1;DP4=0,0,5,28;MQ=9  GT:PL   1:90,0
Imtechella_halotolerans_length_3113269  2005523 .   C   T   195 PASS    DP=64;VDB=0.0037203;SGB=-0.693144;RPB=0.727216;MQB=7.61251e-07;MQSB=9.95686e-06;BQB=0.86319;MQ0F=0;AC=1;AN=1;DP4=2,19,25,14;MQ=17   GT:PL   1:222,0
If it was such a case where I was looking to find variants common across all samples I would run the command with n =6 in this instance but that isn't the case.
I'm just wondering if there's a more illuminating way to identify common variants by position across multiple samples. Any insight greatly appreciated, thanks!
Brilliant thanks! I'm new to snp calling so am glad someone has had this dilemma before.
Cool, let me know if it works. The code for part 3 is well-tested but is quite an eye-sore.