How to merge VCF files, and consider variants within a set distance as one
1
1
Entering edit mode
10.0 years ago
roddy_p ▴ 10

I'm trying to merge several VCF files, each with inserts from a different individual. There are inserts that appear in only one individual, yet other individuals have inserts very close by (with a difference of a few nucleotides). I suspect that these inserts are the same. Is there a way of merging these VCF files, and consider variants located within a set distance (e.g. 100 nucleotides) to be the same variant?

Thanks!

VCF variant • 4.0k views
ADD COMMENT
0
Entering edit mode
10.0 years ago
DG 7.3k

There are several options for merging VCF files. The GATK set of tools has CombineVariants and vcftools also has a vcf_merge script. I generally use the CombineVariants option. For your second problem I suspect there are a variety of ways to approach it but I'm not sure what the "best" way is off hand. You could certainly script something fairly readily to filter out everything except indels and select indels close to each other in different samples for reporting. If using Python for instance you could use PyVCF and iterate over variants and samples. PyVCF has fairly extensive documentation.

ADD COMMENT

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6