Question

Can't index .bed file with tabix

0

Entering edit mode

2.8 years ago

zjardyn • 0

Hello,

I am trying to index .bed like files from Leafcutter output as described here: Leafcutter sQTL

Leafcutter generates an .sh script to do this for each output file in perind.counts.gz_prepare.sh which looks like this:

bgzip -f test_perind.counts.gz.qqnorm_chr1A \ tabix -p bed test_perind.counts.gz.qqnorm_chr1A.gz

The error I am getting is the following:

[E::hts_idx_push] Region 536959441..536959528 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6 tbx_index_build failed: test_perind.counts.gz.qqnorm_chr1A.gz

I am not sure how to deal with this error, and I need .tbi files to use as input for fastQTL. I have tried to use the -m, --min-shift INT flag which works but generates .csi files which don't work with fastQTL.

Any help would be great thanks!

bed tabix leafcutter • 1.4k views

ADD COMMENT • link updated 2.8 years ago by cmdcolin ★ 3.8k • written 2.8 years ago by zjardyn • 0

0

Entering edit mode

Have you tried making a .csi index instead? It seems likely your downstream tool would accept that if it accepts .tbi.

ADD REPLY • link 2.8 years ago by 4galaxy77 2.8k

0

Entering edit mode

I would probably want to ask the leafcutter developer(s) about how to use CSI https://github.com/davidaknowles/leafcutter/

ADD REPLY • link 2.8 years ago by cmdcolin ★ 3.8k

score 0 · Answer 1 · 2021-06-19

I have read the tabix manual page here: tabix(1) and now see that I can only index up to 512 Mbp (2^29 bases) in length. This is unfortunate because I am dealing with wheat genome data which is large in general. Is there no other way to generate .tbi indexs? Should I just remove anything that is larger than 512 Mbp?

Interestingly, some of the output files have .tbi indexs were generated:

test_perind.counts.gz.qqnorm_chr1D.gz.tbi test_perind.counts.gz.qqnorm_chr4D.gz.tbi test_perind.counts.gz.qqnorm_chr6D.gz.tbi test_perind.counts.gz.qqnorm_chrUn.gz.tbi

I looked into this by checking the header for my accompanying .vcf file and my suspicions have been confirmed:

contig=<ID=1D,length=495453186> contig=<ID=4D,length=509857067> contig=<ID=6D,length=473592718> contig=<ID=Un,length=480980714>

These are the only chromosome IDs that are smaller than 512 Mbp.

Side note: I am new to sites like biostars, stackoverflow, etc and was wondering how I could get my above code chunks to just print one line at a team instead of being squished together.

Thanks!

score 0 · Answer 2 · 2021-06-19

0

Entering edit mode

2.8 years ago

Pierre Lindenbaum 161k

bcftools index your.file.vcf.gz

ADD COMMENT • link 2.8 years ago by Pierre Lindenbaum 161k