Hello,
From a VCF file, I am trying to pull the GT information (in the format of 0/1, 0/0, 2/0 etc.) for certain positions per chromosome per sample. When I run the following code I get the resulting line for each chromosome and contig:
bcftools query -t 4:58000000-59000000 -f '[ %GT]\n' myfile.vcf
[W::vcf_parse] Contig 'scaffold_869' is not defined in the header.
(Quick workaround: index the file with tabix.)
So I zip compressed and indexed the file with bgzip and tabix:
bgzip myfile.vcf
tabix myfile.vcf.gz
When I re-run the above query with the new indexed file, I get:
bcftools query -t 4:58000000-59000000 -f '[ %GT]\n' myfile.vcf.gz.tbi
[E::hts_hopen] Failed to open file myfile.vcf.gz.tbi
[E::hts_open_format] Failed to open file "myfile.vcf.gz.tbi" : Exec
format error Failed to read from myfile.vcf.gz.tbi: Exec format error
I have checked and changed the myfile.vcf.gz.tbi permissions to -rwxrwxrwx via chmod
. I have checked the file format via htsfile
and get:
myfile.vcf.gz.tbi: Tabix compressed index data
Also, the input VCF data is position sorted. I am using samtools version 1.10-98 and htslib 1.10.2-135 on a university server.
Can someone please suggest what is going wrong and why my file cannot be read?
Thanks in advance
Hi Pierre, Thanks for your help. I have added the chromosome lengths into the header using:
after compressing and indexing the file_header.vcf like previously, i still get an Exec error message:
I have full permissions and am using samtools 1.10-98-gfaab8b0 + htslib 1.10.2-135-gf4f7f24. Any other suggestions on why samtools cant read my file?