Intersection of multiple vcf files
0
0
Entering edit mode
6 months ago
avelarbio46 ▴ 30

Hello everyone!

I'm trying to intersect a VCF file like this:

bcftools isec -n +2 main-vcf.vcf.gz subset-1.vcf.gz subset-2.vcf.gz

So a variant can be present in the first file and one or more than one subset

Basically I`m looking for, in binary:

111
101
100
110

Both subset-1.vcf.gz subset-2.vcf.gz are subsets of main-vcf.vcf.gz. They might or might not contain similar variants between themselves, but I'm not interested in this. I'm interested in annotating my main VCF based on these subsets, to know which variants from file 1 are present on subsets 1 and 2.

When I look at my sites.txt output, I have columns with 3 numbers and two numbers:

chr19   603747  C   T   110 
chr 5150124275  G   T   11

I get that 110 should mean this site is present in both files 1, 2 but not 3

But, what does the 11 mean in this case? Which files is bcftools comparing for that site? I can't find any explanation on bcftools manual for the sites results on multiple comparisons. This is even worse when comparing more than 3 files

Any ideas?

isec vcf bcftools • 349 views
ADD COMMENT
3
Entering edit mode

isec is pretty awful for these set operations - especially since individual samples present alleles, not lines in a VCF file. If you can cook up some example VCFs and show us what you consider a target we can help

ADD REPLY

Login before adding your answer.

Traffic: 1164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6