Hello everyone!
I'm trying to intersect a VCF file like this:
bcftools isec -n +2 main-vcf.vcf.gz subset-1.vcf.gz subset-2.vcf.gz
So a variant can be present in the first file and one or more than one subset
Basically I`m looking for, in binary:
111
101
100
110
Both subset-1.vcf.gz subset-2.vcf.gz are subsets of main-vcf.vcf.gz. They might or might not contain similar variants between themselves, but I'm not interested in this. I'm interested in annotating my main VCF based on these subsets, to know which variants from file 1 are present on subsets 1 and 2.
When I look at my sites.txt output, I have columns with 3 numbers and two numbers:
chr19 603747 C T 110
chr 5150124275 G T 11
I get that 110 should mean this site is present in both files 1, 2 but not 3
But, what does the 11 mean in this case? Which files is bcftools comparing for that site? I can't find any explanation on bcftools manual for the sites results on multiple comparisons. This is even worse when comparing more than 3 files
Any ideas?
isec is pretty awful for these set operations - especially since individual samples present alleles, not lines in a VCF file. If you can cook up some example VCFs and show us what you consider a target we can help