Question: importing a GFA2 scaffold graph
gravatar for egoltsman
13 months ago by
United States
egoltsman0 wrote:

Hi all, I'm exploring GATB with the idea of potentially replacing our custom assembly graph implementation with this wonderful library. As often the case, instead of creating an overlap graph from scratch (i.e. the reads), I'd like to jump in in the middle and import a set of 3rd-party unitigs and possibly paired-end mapping information to create a contig graph that would include both inter-node overlaps (as edges) and long-range scaffolding (as gaps). I could create a GFA2 file with all that info and convert to HDF5, but I wasn't sure from the documentation if that would be enough. It states that it must be "a '.h5' file is created using dbgh5 program provided with GATB-Core". I can certainly ensure that my contigs are unique in their kmer content, but what other restrictions does the Graph::load API have?
If anyone has tries something similar, any tips would be greatly appreciated!

gatb • 456 views
ADD COMMENTlink modified 9 months ago by Rayan Chikhi1.4k • written 13 months ago by egoltsman0
gravatar for Rayan Chikhi
9 months ago by
Rayan Chikhi1.4k
France, Lille, CNRS
Rayan Chikhi1.4k wrote:


I hope the answer is still relevant now. Converting GFA2 to HDF5 doesn't make much sense in GATB: the info that we store inside the HDF5 actually consists of k-mers counts and a Bloom filter (and other stuff). So the graph stored in a .h5 is a regular de Bruijn graphn and cannot be of any other type.

In Minia there is actually early support for loading a GFA1 graph. It is designed for loading compacted de bruijn graph that were created with BCALM. The behavior for any other type of graph has _not_ been tested. But you're welcome to give it a shot, I assume that it will most likely require to modify GATB-Core code. If you're serious about following this road, please shoot me an email.

Actually, a recommended road that I could advise you to follow is to create "conservative" contigs using Minia (by tweaking the tip removal steps), or any other assembler, and load the resulting graph in e.g. Python, as it should most likely be much smaller (in terms of nodes) and manageable.


ADD COMMENTlink modified 9 months ago • written 9 months ago by Rayan Chikhi1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1075 users visited in the last hour