Utility Of Abyss .Dot Files
2
3
Entering edit mode
10.2 years ago

I just successfully completed a eukaryotic genome assembly with ABySS, and I can see it generated several .dot files, presumably for generating graphics with graphviz. My ${name}-scaffolds.dot file contains more than 10 million of lines, which as you might expect leads to some issues in practice. First, the bigmem machine I performed the assembly on doesn't even have enough memory to plot the graph with the dot program. Second, even if it did, it's highly unlikely that any useful visualization could be derived from the resulting graphic.

My question: what is the utility of the .dot files generated by ABySS? Are these primarily for working with small toy data sets, or is there some other utility that I've missed?

Thanks!

assembly • 3.5k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
Shaun Jackman ▴ 420

Hi, Daniel.

The .dot files represent the sequence overlap graph of the contigs/scaffolds. The .dist.dot files represent the distances estimates between contigs/scaffolds based on paired-end information. These dot files are used to pass information from one stage of the assembly to the next. ABySS uses a fairly compact, memory-efficient format to store these graphs in memory. GraphViz uses a more versatile and general representation, but requires much more memory. As you noted, you can use GraphViz to visualize the graphs of small genomes. For larger genomes, you can pull out subsets of the graph for visualization (using grep for a start, or fancier tools potentially), or you can use graph visualization tools that are intended for large graphs, such as Gephi.

Here's a brand new wiki page to describe the various file formats of ABySS. It's terse for now, but it will be expanding. https://github.com/bcgsc/abyss/wiki/ABySS-File-Formats

Cheers, Shaun

ADD COMMENT
1
Entering edit mode
10.2 years ago
jts ▴ 240

ABySS uses dot as its working file format - it will read and write dot files as it produces the assembly. These large dot files are not (primarily) for visualisation.

ADD COMMENT

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6