Question: Bcftools unsorted positions?
0
gravatar for vctrm67
8 months ago by
vctrm6720
vctrm6720 wrote:

I am trying to sort a particular vcf file using bcftools for use for another software tool (MuSE). I tried running bcftools sort file.vcf > newfile.vcf, which ran without errors, but I get this message from the software tool: [E::hts_idx_push] Unsorted positions on sequence #24: 13474350 followed by 9950583. I'm not sure where these numbers are coming from, since I tried running grep 13474350 file.vcf and grep 9950583 file.vcf but don't see anything there. Am I missing something?

bcftools • 614 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by vctrm6720

version of bcftools ? is there a dictionary ('##contig' lines in the header) ,

ADD REPLYlink written 8 months ago by Pierre Lindenbaum131k

Try grepping for 13474351 and 9950584: VCF coordinates are 1-based while bcftools uses 0-based coordinates internally.

ADD REPLYlink written 8 months ago by chrchang5237.3k

@Pierre Yes there is a dictionary, and it's version 1.9.
@chrchang Tried it but didn't output anything either...

ADD REPLYlink modified 8 months ago by RamRS30k • written 8 months ago by vctrm6720

Ok, just took a quick look at the relevant htslib source code (hts.c line 1851 in the current develop branch) and it does convert back to 1-based coordinates when printing an error message.

You may need to post an example file and command which can be used to reproduce the error to get useful help at this point.

ADD REPLYlink written 8 months ago by chrchang5237.3k

This is really odd. Why would bcftools sort throw an error that says that the input file is unsorted? Is it not the job of bcftools sort to do the sorting? I have a feeling that the file you're trying to sort and the file you're grep-ing are not the same file.

Are you piping stuff to bcftools sort, perhaps?

ADD REPLYlink written 8 months ago by RamRS30k

Sorry, I can see how the post is confusing. Edited for clarification.

ADD REPLYlink written 8 months ago by vctrm6720

Do you see any results when grep-ing for those positions in newfile.vcf? Does MuSE say anything else?

Did you get the file.vcf file by subsetting a different VCF file using a bed file perhaps? If so, overlapping regions in the BED file could cause duplicate entries occurring in different positions in the resultant VCF file.

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1008 users visited in the last hour