Intersection of multiple vcf files
Entering edit mode
7 weeks ago
avelarbio46 ▴ 30

Hello everyone!

I'm trying to intersect a VCF file like this:

bcftools isec -n +2 main-vcf.vcf.gz subset-1.vcf.gz subset-2.vcf.gz

So a variant can be present in the first file and one or more than one subset

Basically I`m looking for, in binary:


Both subset-1.vcf.gz subset-2.vcf.gz are subsets of main-vcf.vcf.gz. They might or might not contain similar variants between themselves, but I'm not interested in this. I'm interested in annotating my main VCF based on these subsets, to know which variants from file 1 are present on subsets 1 and 2.

When I look at my sites.txt output, I have columns with 3 numbers and two numbers:

chr19   603747  C   T   110 
chr 5150124275  G   T   11

I get that 110 should mean this site is present in both files 1, 2 but not 3

But, what does the 11 mean in this case? Which files is bcftools comparing for that site? I can't find any explanation on bcftools manual for the sites results on multiple comparisons. This is even worse when comparing more than 3 files

Any ideas?

isec vcf bcftools • 220 views
Entering edit mode

isec is pretty awful for these set operations - especially since individual samples present alleles, not lines in a VCF file. If you can cook up some example VCFs and show us what you consider a target we can help


Login before adding your answer.

Traffic: 3501 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6