Entering edit mode
4.1 years ago
vctrm67
▴
50
I am trying to sort a particular vcf file using bcftools for use for another software tool (MuSE). I tried running bcftools sort file.vcf > newfile.vcf
, which ran without errors, but I get this message from the software tool: [E::hts_idx_push] Unsorted positions on sequence #24: 13474350 followed by 9950583
. I'm not sure where these numbers are coming from, since I tried running grep 13474350 file.vcf
and grep 9950583 file.vcf
but don't see anything there. Am I missing something?
version of bcftools ? is there a dictionary ('##contig' lines in the header) ,
Try grepping for 13474351 and 9950584: VCF coordinates are 1-based while bcftools uses 0-based coordinates internally.
@Pierre Yes there is a dictionary, and it's version 1.9.
@chrchang Tried it but didn't output anything either...
Ok, just took a quick look at the relevant htslib source code (hts.c line 1851 in the current develop branch) and it does convert back to 1-based coordinates when printing an error message.
You may need to post an example file and command which can be used to reproduce the error to get useful help at this point.
This is really odd. Why would
bcftools sort
throw an error that says that the input file is unsorted? Is it not the job ofbcftools sort
to do the sorting? I have a feeling that the file you're trying to sort and the file you'regrep
-ing are not the same file.Are you piping stuff to
bcftools sort
, perhaps?Sorry, I can see how the post is confusing. Edited for clarification.
Do you see any results when grep-ing for those positions in
newfile.vcf
? Does MuSE say anything else?Did you get the
file.vcf
file by subsetting a different VCF file using a bed file perhaps? If so, overlapping regions in the BED file could cause duplicate entries occurring in different positions in the resultant VCF file.