Can't index .bed file with tabix
2
0
Entering edit mode
4 months ago
zjardyn • 0

Hello,

I am trying to index .bed like files from Leafcutter output as described here: Leafcutter sQTL

Leafcutter generates an .sh script to do this for each output file in perind.counts.gz_prepare.sh which looks like this:

bgzip -f test_perind.counts.gz.qqnorm_chr1A \ tabix -p bed test_perind.counts.gz.qqnorm_chr1A.gz

The error I am getting is the following:

[E::hts_idx_push] Region 536959441..536959528 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6 tbx_index_build failed: test_perind.counts.gz.qqnorm_chr1A.gz

I am not sure how to deal with this error, and I need .tbi files to use as input for fastQTL. I have tried to use the -m, --min-shift INT flag which works but generates .csi files which don't work with fastQTL.

Any help would be great thanks!

bed tabix leafcutter • 271 views
ADD COMMENT
0
Entering edit mode

Have you tried making a .csi index instead? It seems likely your downstream tool would accept that if it accepts .tbi.

ADD REPLY
0
Entering edit mode

I would probably want to ask the leafcutter developer(s) about how to use CSI https://github.com/davidaknowles/leafcutter/

ADD REPLY
0
Entering edit mode
4 months ago
zjardyn • 0

I have read the tabix manual page here: tabix(1) and now see that I can only index up to 512 Mbp (2^29 bases) in length. This is unfortunate because I am dealing with wheat genome data which is large in general. Is there no other way to generate .tbi indexs? Should I just remove anything that is larger than 512 Mbp?

Interestingly, some of the output files have .tbi indexs were generated:

test_perind.counts.gz.qqnorm_chr1D.gz.tbi test_perind.counts.gz.qqnorm_chr4D.gz.tbi test_perind.counts.gz.qqnorm_chr6D.gz.tbi test_perind.counts.gz.qqnorm_chrUn.gz.tbi

I looked into this by checking the header for my accompanying .vcf file and my suspicions have been confirmed:

contig=<ID=1D,length=495453186> contig=<ID=4D,length=509857067> contig=<ID=6D,length=473592718> contig=<ID=Un,length=480980714>

These are the only chromosome IDs that are smaller than 512 Mbp.

Side note: I am new to sites like biostars, stackoverflow, etc and was wondering how I could get my above code chunks to just print one line at a team instead of being squished together.

Thanks!

ADD COMMENT
0
Entering edit mode
4 months ago
bcftools index your.file.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2224 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6