Open Source Java De Novo Assembly Software
2
1
Entering edit mode
11.2 years ago
lyz10302012 ▴ 450

Is there any open source java de novo assembly softwares or de bruijn graph algorithms?

Thanks

differential-expression assembly • 4.3k views
ADD COMMENT
1
Entering edit mode

Do you mean stuff that actually works on real data, or just for information/investigation?

ADD REPLY
0
Entering edit mode

I mean stuff that actually works on real data.

ADD REPLY
2
Entering edit mode

Ah. Well in that case I have never heard of one I'm afraid. Genome assembly is quite memory hungry, and so I think people tend to want to maintain very tight control of how they allocate and release memory. I could be wrong, there have been a lot of assemblers written, and you might find one that works on bacterial data - but if so I expect someone will reply here to tell us. best Zam

ADD REPLY
0
Entering edit mode

I agree. De novo assembly is the most memory demanding application in computational biology, while Java is notorious for its huge memory consumption. We can hardly see how they fit together...

ADD REPLY
1
Entering edit mode
11.2 years ago
Rayan Chikhi ★ 1.5k

There exists one open-source Java software that constructs the de Bruijn graph (but it does not assemble). http://grafia.cs.ucsb.edu/msp/download.html

There does not appear to exist any open source de novo assembler written in Java.

Source: http://seqanswers.com/wiki/Special:BrowseData/Bioinformatics_application?Language=Java&Bioinformatics_method=Assembly&Biological_domain=De-novo_assembly

EDIT:

Indeed, Contrail can be considered as a de novo assembler written in Java.

ADD COMMENT
0
Entering edit mode
11.2 years ago
Chris Whelan ▴ 570

This is probably not exactly what you are looking for, but Michael Shatz's group is working on a Hadoop-based de Bruijn graph assembler called Contrail:

http://sourceforge.net/apps/mediawiki/contrail-bio/index.php?title=Contrail

Since it is a native Hadoop application it is mostly written in Java, although it appears that they also invoke other non-Java programs (FLASH and QUAKE) as part of their workflow.

This gets around the memory management features of Java that make a traditional single-machine algorithm hard to implement by using Hadoop's mechanisms of streaming data to and from disks across a cluster.

I am not sure what its state of readiness/current ability to run on real data is.

ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6