Unable to Create Index file of VCF
1
1
Entering edit mode
21 months ago
anasjamshed ▴ 140

I have vcf file named "trio_example.vcf" . I am trying the following commands:

bgzip -c trio_example.vcf > trio_example.vcf.gz
htsfile trio_example.vcf.gz
tabix -p vcf trio_example.vcf.gz

But it is showing the following error:

[E::hts_idx_push] Unsorted positions on sequence #1: 60066 followed by 60043
tbx_index_build failed: trio_example.vcf.gz

How can I solve this?Why indexing is necessary?

VCF • 3.1k views
ADD COMMENT
6
Entering edit mode

By my count, that's already two questions unrelated to your original problem. You are yet to acknowledge the help you received, either verbally or by upvoting the answers. Instead, it is like "I need help, so it doesn't matter how many unrelated questions I ask in the same thread."

One would think this is your first time asking a question rather than having done it 100+ times. I guess there is some truth to the saying that if one doesn't learn something properly the first 3-4 times they do it, chances are they will never learn.

ADD REPLY
3
Entering edit mode
21 months ago
ATpoint 85k

Sort it as indicated in the error

Sort VCF File by Position?

Indexing enables random access.

ADD COMMENT
0
Entering edit mode

now i tried :

cat trio_example.vcf | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' > out_sorted.vcf
bgzip -c out_sorted.vcf > trio_example.vcf.gz
tabix -p vcf trio_example.vcf.gz

and it works well. But what is the purpose of creating index file

ADD REPLY
4
Entering edit mode

cat trio_example.vcf | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' > out_sorted.vcf

whoa.

but please, use dedicated tools:

bcftools sort -o sorted.vcf.gz -O z  in.vcf 
bcftools index  sorted.vcf.gz

But what is the purpose of creating index file

fast random access in the VCF. Imagine you have a 100Gbytes VCF file and you wonder if the VCF contains a variant on chrY:10000 . No need to scan all the file for hours. bcftools will use the index to extract the variant in the region

bcftools view sorted.vcf.gz "chrY:10000-10000"
ADD REPLY
1
Entering edit mode

Can the sex of each individual be verified (even if imperfectly) through this vcf file. I am trying this command:

bcftools +guess-ploidy trio_example.vcf.gz

and it gives me following results:

Warning: PL tag not found in header, switching to GL
Warning: GL tag not found in header, switching to GT
N1      F
N2      F
N3      F
ADD REPLY
1
Entering edit mode

it's unrelated to your original question.

ADD REPLY
0
Entering edit mode

yes but i need help?

ADD REPLY
3
Entering edit mode
ADD REPLY
0
Entering edit mode

True towards dedicated tools, I updated my linked answer with bcftools sort.

ADD REPLY
0
Entering edit mode

thanks for helping me

ADD REPLY
0
Entering edit mode

Hi @anasjamshed, ATpoint & Pierre Lindenbaum

I encountered the same issue with sorting and tried bcftools sort / bcftools index many time but problem could not get resolved. However, this script worked for me.

cat trio_example.vcf | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' > out_sorted.vcf

Thank you.

Best, CK

ADD REPLY

Login before adding your answer.

Traffic: 848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6