vg autoindex - write_gcsa_kmers() size limit exceeded
0
0
Entering edit mode
23 months ago
Andrew ▴ 30

I'm attempting to run a dataset through vg autoindex to test out the tool and map some reads to a reference, but each time I attempt to get the indices made, the autoindexer gets stuck in a loop of exceeding disk size while making the GCSA index (it threw this error 13 times before being killed by the SLURM out of memory handler):

error: [write_gcsa_kmers()] size limit exceeded
[IndexRegistry]: Exceeded disk use limit while generating k-mers. Rewinding to pruning step with more aggressive pruning to simplify the graph.
[IndexRegistry]: Pruning complex regions of VG to prepare for GCSA indexing.
/home/asherrar/scripts/vg_create_manual.sh: line 18: 17672 Killed                  vg autoindex -w map -r $ref "${vars[@]}" -p $base_dir/t2t_hg002var -T $tmpdir -M 160G -t 8 -V 1

I'm trying to index with the T2T-CHM13 human genome reference using a lifted-over version of the HG002 variant call set (83 MB, gzipped) - I don't think that should be an issue, considering I saw examples of the 1000G dataset being processed with vg on the github page, so I'm assuming there's something incorrect about my setup. I'm running the most recent version of vg (1.44.0) through Singularity as a SLURM job, and I'm giving it 32 CPUs and 256 GB of RAM (limiting autoindex to 160 GB), as well as a temporary storage directory with up to 10 TB of space available.

I'm not sure what the issue is or what I should be tweaking, I'm not the most experienced with high performance computing and the support staff for the cluster I'm on aren't familiar with vg. Any advice would be deeply appreciated.

vg-autoindex vg vg-toolkit vgteam • 1.4k views
ADD COMMENT
1
Entering edit mode

This error typically indicates a very complex graph (at least in some localized regions). I agree that your data does not sound like it should be leading to this kind of trouble. Can you share the data? Or at least post the full stderr output?

ADD REPLY
0
Entering edit mode

Here's a zip of all the relevant files:

  • the singularity image I'm using of vg 1.44.0
  • the full error log from the last run
  • the script I ran
  • all relevant data files (the reference and the VCFs - split by chromosome and including their tabix indices)

Let me know if I've missed anything or if there's more I can provide.

ADD REPLY
0
Entering edit mode

I've tracked it down to a bug, which should be pretty easy to fix. I'll update this thread when it's merged into the main branch. Thanks for pointing this out!

ADD REPLY
0
Entering edit mode

Okay, this bug has been fixed in vg's main branch. The fix will be included in the next release. If you'd like an interim docker image before then, you can use quay.io/vgteam/vg:ci-590-8570100f0945d6c3d0851c5da2feaf19d492f14d

ADD REPLY
0
Entering edit mode

Awesome, I'll grab that and use it until the next main version comes out. Thanks so much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6