I'm trying to use vcf-merge to combine 2 exome capture vcf files (~250K and ~330K in size) before trying it on all 96 samples. I'd appreciate any advice on the best way to do that! I've detailed what I've tried below. My issue seems to be with using tabix to convert the files to .tbi format.
Step 1: BGZIP
So far, I've zipped the files without issue:
bgzip sample1.vcf
bgzip sample2.vcf
Which produces:
sample1.vcf.gz
sample2.vcf.gz
Step 2: TABIX
When I try to use this command: tabix -h -p vcf sample1.vcf.gz
the stderr is: Region 536999277..536999278 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6. tbx_index_build failed: sample1.vcf.gz
Using the -C option which works:
tabix -C -h -p vcf sample1.vcf.gz
tabix -C -h -p vcf sample2.vcf.gz
Which produces:
sample1.vcf.gz.csi
sample2.vcf.gz.csi
Step 3: VCF-MERGE
When I use this command: vcf-merge sample1.vcf.gz.csi sample2.vcf.gz.csi > out.vcf.gz
the merge fails, and I get this stderr:
Broken VCF header, no column names?
at /usr/share/perl5/Vcf.pm line 172, <__ANONIO__> line 1.
Vcf::throw(Vcf4_2=HASH(0x5645a760bc38), "Broken VCF header, no column names?") called at /usr/share/perl5/Vcf.pm line 867
VcfReader::_read_column_names(Vcf4_2=HASH(0x5645a760bc38)) called at /usr/share/perl5/Vcf.pm line 602
VcfReader::parse_header(Vcf4_2=HASH(0x5645a760bc38)) called at /usr/bin/vcf-merge line 183
main::init_cols(HASH(0x5645a761f438), Vcf4_2=HASH(0x5645a760b248)) called at /usr/bin/vcf-merge line 279
main::merge_vcf_files(HASH(0x5645a761f438)) called at /usr/bin/vcf-merge line 12
If I exclude the tabix files, the merge still fails and the stderr says: The column names not tab-separated? Could not load .tbi index of sample1.vcf.gz. The command exited with an error. Is the file tabix indexed?
Instead of using
bold
attribute:Please use the formatting bar (especially the
code
option) to present your post better.Thank you!
Sorry about that! Fixed it.
Thanks! Easy to read this way.
Hello brallen!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/13282/vcf-merge-fails-due-to-tabix-not-producing-tbi-files
This is typically not recommended as it runs the risk of annoying people in both communities.
I answered your question there and Kevin answered it here with similar answers. In effect, one of our time and effort have been wasted.
Hi, there RamRS
Truly sorry! Didn't think about how selfish that was. I'm new to this and having to teach myself, so was trying to reach as wide an audience as possible. Won't happen again.