bcftools index error: [E::bgzf_read_block] Invalid BGZF header at offset 12889964510
0
1
Entering edit mode
3 days ago
biostars ▴ 10

I downloaded 13G VCF files for a set of chimpanzees, from here:

https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Pan_troglodytes.vcf.gz

I am trying to index this vcf with

bcftools index -f Pan_troglodytes.vcf.gz

But i get the error:

[E::bgzf_read_block] Invalid BGZF header at offset 12889964510
index: failed to create index for "Pan_troglodytes.vcf.gz"

If I try without the -f flag (bcftools index Pan_troglodytes.vcf.gz) then the error i get is:

index: the input is probably truncated, use -f to index anyway. I deleted the file and re downloaded, but same problem. 

The other vcf files I download from this site, eg https://eichlerlab.gs.washington.edu/greatape/data/VCFs/SNPs/Gorilla.vcf.gz work absolutely fine.

Does anyone know what is causing this error, and how to solve it?

Versions: bcftools 1.14-48-g58f886f Using htslib 1.14-22-g3f7e13e

bcftools • 263 views
ADD COMMENT
0
Entering edit mode

it is not guaranteed that the file you downloaded is bgzipped, it may be plain gzipped which would not be indexed properly. you can try gunzip -c Pan_troglodytes.vcf.gz | bgzip -c > Pan_troglodytes.vcf.bgz; bcftools index -f Pan_troglodytes.vcf.bgz;

ADD REPLY
0
Entering edit mode

actually I checked with file Pan_troglodytes.vcf.gz and it said Blocked GNU Zip Format (BGZF; gzip compatible) so unsure, it could be legitimately corrupt but the above command did seem to help somewhat but it's hard to verify the integrity of the data (the gunzip there was trailing garbage ignored)

ADD REPLY
0
Entering edit mode

test the file is not corrupted:

gunzip -t Pan_troglodytes.vcf.gz
ADD REPLY
0
Entering edit mode

this command produces gzip: Pan_troglodytes.vcf.gz: decompression OK, trailing garbage ignored

ADD REPLY

Login before adding your answer.

Traffic: 1154 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6