Entering edit mode
2.3 years ago
Yihang
•
0
When I use vg convert (v.1.39) to convert a gam file to a gaf file (or vice versa), I find that the order of the reads will not be preserved. For example, when I sort the reads according to the read ID in a gaf file and then convert it to a gam file, the reads are not ordered by the read ID anymore.
I am very curious about how vg convert works, and is there any way that can do the conversion and preserve the read orders?
Thanks!
Current vg is 1.48 I believe, why use such an old version ? There might also be no reason to preserve read order, and vg convert might be more efficient like this.
I tried vg 1.49 (current version), but still not work. The reason I want read-order-preserving property is that when we map some special kinds of pair reads to VG, having order preserve would significantly simplify the downstream analysis.
The reason that the read order is not preserved is that
vg convertruns in parallel over the alignments, which introduces nondeterminism from the scheduler. I think you should be able to preserve read order by running in one thread (-t 1), but it will run much slower.Thanks! Yeah I also thought about this point, so I did some extra experiments. Here are some observations.
When I use
vg convert -t 1 -Fon a.gaffile which is already sorted by read ID (1,2,3,4...), I find that the output.gamfile still does not preserve the order (the output order is 33281, 33282, 33283,...). If I tried this command multiple times, the order of.gamwill be the same. (33281, 33282, 33283,...)When I use
vg convert -Fwithout-t 1, the order is not preserved as I observed before. However, if I tried this command multiple times, the order of.gamwill be different.Therefore, I think multithreading does impact the order of the output
.gamfile. However, it cannot explain why when I use one thread, the order is not the same as the input.gaffile. Do you know why this happens?If you just want to have the read ID ordered, you can use the unix sort command if your input is a
GAFfile. If you have agam, it's more problematic.This toolset might be useful for GAF sorting - https://github.com/marschall-lab/gaftools