I have two GFA files representing the same dataset. I am trying to map reads to these two files using Giraffe. One of them is much smaller than the other (17 MB vs 52 MB). However, the distance index computation (and hence autoindex) is much faster on the larger dataset. As a matter of fact, it only took a couple of seconds on that dataset while it has been running for several hours on the smaller dataset.
I want to know why this is the case and how it can be fixed. I have attached a Google Drive link containing the two datasets with this post.
The command I am using is
vg autoindex --workflow giraffe -g <file_name.gfa> -p <output_file_name>
Here are the two GFA files: https://drive.google.com/drive/folders/1mCmgIuVTDthDS7h5iW0PY86-dkFYGCDG?usp=sharing The large one is very fast during indexing while the small one is sluggishly slow.