I have three VCF files for a non-model species that I'd like to merge into a single VCF file. Each file corresponds with a unique set of individuals and a subset of SNPs that is overlapping among files.
It seems based on other posts that a popular tool for performing this task is
bcftools merge. I first compress my VCF files in bgzip format with
bgzip -ci infileA.vcf > outfileA.vcf.gz
This also creates an index file
I next attempt to merge the files with
bcftools merge outfileA.vcf.gz outfileB.vcf.gz outfileC.vcf.gz
But I get an error message that the index files cannot be found, even though they are in the same directory:
Failed to open outfileA.vcf.gz: could not load index
I've seen posts that discuss this error message when attempting to use other tools in bcftools...some of these posts describe the issue as resulting from the system attempting to open too many files at once. But the system I am working on allows over 4,000 files to be open at once, and either way I am working with only six files (including the compressed VCF files and index files).
When I check the format of my files with
htsfile I get the following:
htsfile outfileA.vcf.gz outfileA.vcf.gz: VCF version 4.2 BGZF-compressed variant calling data htsfile outfileA.vcf.gz.gzi outfileA.vcf.gz.gzi: unknown data
This suggests there may be some problem with the index file?
I've also attempted to generate index files outside fo the bgzip command, using
bcftools index as follows:
bcftools index -t outfileA.vcf.gz -o outfileA.vcf.gz.tbi
But get an error about sorting:
[E::hts_idx_push] Unsorted positions on sequence #1. However, when I then attempt to sort prior to indexing:
bcftools sort outfileA.vcf.gz -o outfileA.sort.vcf.gz
I get another error message:
[W::vcf_parse] Contig 'NW_083863.1' is not defined in the header. (Quick workaround: index the file with tabix.)
When I then attempt to index with tabix:
I get another error, which seems to hint at a need for sorting:
[ti_index_core] the file out of order at line 45
My main issue is getting
bcftools merge (or an analogous tool) to work for merging my VCF files. Has anyone else run into this same issue? Is there something obvious I'm doing wrong? Thanks for any tips!