Tool: Mapsembler2 targeted micro assembly and visualization of the local assembly graph
5
gravatar for pierre.peterlongo
2.9 years ago by
France
pierre.peterlongo600 wrote:

Dear all,

I'm please to introduce the Mapsembler2 tool.

graphical interface

Mapsembler2 is a targeted assembly software. It takes as input any number of NGS raw read set(s) (fasta or fastq, gzipped or not) and a set of input sequences (starters). For each starter, Mapsembler2 outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice. Mapsembler2 may be used for (not limited to): · Validate an assembled sequence (input as starter), e.g. from a de Bruijn graph assembly where read-coherence was not enforced. Checks if a known enzyme is present in a metagenomic NGS read set. · Enrich unmappable reads by extending them, possibly making them mappable · Checks what happens at the extremities of a contig · Check the presence / absence and quantify RNA seq splicing events. Check the presence/absence of SNPs or structural variants, …

Based on the Minia data-structure it has a tiny memory footprint (human read sets can be analyzed with no more than 6GB memory) while being faster than other mentioned tools.

Finally we put effort to make it simple. The micro assembly step is in command-line fashion. We made it as simple as possible as shown in our dedicated video. Another video presents the graphical interface usage.

Home web page (download, GNU Affero General Public License, manual, galaxy install, videos) is here: http://colibread.inria.fr/mapsembler2/

Any comment/feedback is warmly welcome.

Best,

Pierre

ADD COMMENTlink modified 2.5 years ago by shelkmike20 • written 2.9 years ago by pierre.peterlongo600

I couldn't find a git repository for mapsembler2. I had a minor pull request I wanted to submit (i.e. finish renaming run_mapsembler_and_phaser.sh to run_mapsembler2_pipeline.sh in the documentation and usage info). Thanks so much for developing mapsembler2!

ADD REPLYlink written 4 weeks ago by maizemu0

Hey, Thanks for your suggestion. I may do the modification. However, I'm afraid that mapsembler would deserve a much deeper code update and review. Many users complain that compilation fails, depending on their OS and GCC version.

Anyone interested in the maintenance and update of the code is warmly welcome to take the helm on the project.

Best, Pierre

ADD REPLYlink written 4 weeks ago by pierre.peterlongo600
2
gravatar for pierre.peterlongo
2.8 years ago by
France
pierre.peterlongo600 wrote:

Hi all,

We propose a new version of mapsembler2 that should hopefully fix compilation problems.

This new version is available from the web page: http://colibread.inria.fr/mapsembler2/ or directly here: http://www.irisa.fr/symbiose/people/ppeterlongo/mapsembler2_2.2.3.zip

Don't hesitate to continue to raise new issues and to comment the tool.

Pierre

ADD COMMENTlink written 2.8 years ago by pierre.peterlongo600
1
gravatar for cts
2.9 years ago by
cts1.5k
Pasadena
cts1.5k wrote:

I also fails to install on a linux server running RHEL, which has by default a pretty old version of gcc (4.4.7). I've tried to install a local and updated version of gcc (4.6.3) however there is something about your make files that don't recognize a non-standard location for g++. For example:

$ which g++
/opt/gcc/4.6.3/bin/g++

But then at the beginning of the cmake output I get the following, which shows that it is still using the system g++, which appears to be too old for the software:

#######################################################################
################## COMPILE MAPSEMBLER2 EXTREMITIES ####################
#######################################################################
-- The C compiler identification is GNU 4.4.7
-- The CXX compiler identification is GNU 4.4.7
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
ADD COMMENTlink written 2.9 years ago by cts1.5k

Hello

Thanks for your interest in mapsembler. I'm sorry for these compilation problems.

I don't know while cmake is looking for 4.4.7 compiler. A temporary response could be to set the CXX environment variable before running the script "./compile_all_tools.sh"

Thus in your case:

CXX=/opt/gcc/4.6.3/bin/g++
./compile_all_tools.sh

 

Pierre

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by pierre.peterlongo600
2
gravatar for shelkmike
2.5 years ago by
shelkmike20
Russian Federation
shelkmike20 wrote:

Hi Pierre!

Mapsembler is a useful program, I use it since it's first version (my paper citing it will be published soon). However, I think, mapsembler needs a more detailed manual, describing more thoroughly its parameters and algorithm.

Especially, I would like to ask (about Mapsembler2  ver 2.2.4)

1)How does a choice of an algorithm of a graph traversal (breadth or depth) affect program performance and results?

2)What is a 'maximal length of nodes' and a 'maximal depth of nodes'? If length is a length in base pairs, why the default is only 40 bp?

And, also, it would be very great if you add somehow an ability to use paired-end information. Not all paired-end reads can be merged by their overlapping ends, many have a long insert size and the use of such pairs (especially in a case of mate-pair reads) would greatly increase the abilities of mapsembler. An algorithm of paired-end extremities extension can be taken, for example, from Gapfiller (http://www.baseclear.com/genomics/bioinformatics/basetools/gapfiller) or PRICE (http://derisilab.ucsf.edu/software/price/).

Thank you!

ADD COMMENTlink written 2.5 years ago by shelkmike20
1

Hi,

You're absolutely right, we need to work on the documentation. More generally, mapsembler2 is quite different from mapsembler and we should prepare a new publication.

1)How does a choice of an algorithm of a graph traversal (breadth or depth) affect program performance and results?

During the graph traversal we had to stop the computations after a while, else the whole graph may be explored and output, which is too time consuming and is useless from a user point of view. The criteria we apply is to stop the exploration after exploring n nodes. This is done either in depth or breadth first fashion. We think that the breadth first is the most adapted to the mapsembler usage: it enables to visualize the whole neighborhood of each starter. On the contrary, the depth first could lead to a deep exploration of one of the branches starting from a starter, but if this branch is explored in n or more nodes, the computation stops for this starter and the other neighbors are not output.

2)What is a 'maximal length of nodes' and a 'maximal depth of nodes'? If length is a length in base pairs, why the default is only 40 bp?

I realize that the names were terribly badly chosen (in addition to the lack of documentation).

 Maximal length of node should be "maximal number of nodes": in a extension, we limit the number of visited nodes (in order to maintain a readable graph).

Maximal depth of nodes is the maximal length (in term of nucleotides) of a path from the starter to the end of the path. This limit the depth of the recursion. This is more a computational criteria

---

Paired reads: yes this is  a nice idea. We need time to check those possibilities.

 

ADD REPLYlink written 2.5 years ago by pierre.peterlongo600
0
gravatar for cts
2.9 years ago by
cts1.5k
Pasadena
cts1.5k wrote:

I tried to install just now on my iMac 10.9 using clang 5 and I get a number of errors related to the tr1 namespace. It would appear that on my system this namespace and the tr1 header files don't exist. Perhaps there is a robust solution that you can implement in the cmake configuration to deal with systems that still use the tr1 namespace vs compilers that no longer use it.

ADD COMMENTlink written 2.9 years ago by cts1.5k

Thanks a lot for this remark.

I've tried to fix the problem that is indeed related to clang. I've prepared a new version fixing this issue (hopefully). You can find it here:

http://colibread.inria.fr/files/2014/08/mapsembler2_2.2.1_cts.zip

Would you please try this version and come back to me?

Thanks, Pierre

ADD REPLYlink written 2.9 years ago by pierre.peterlongo600

It got further this time but failed for different reasons:

#######################################################################
###################### COMPILE KISSREADSGRAPH #########################
#######################################################################
g++ -lz -o block_allocator.o -c block_allocator.cpp -O3 -lz -DMINIA_IS_IN_PARENT_FOLDER
clang: warning: -lz: 'linker' input unused
clang: warning: -lz: 'linker' input unused
block_allocator.cpp:14:5: error: no type named 'free' in the global namespace
                ::free(m_head);
                ~~^
block_allocator.cpp:33:26: error: no member named 'malloc' in the global namespace; did you mean simply 'malloc'?
                char *buffer = (char *)::malloc(alloc_size);
                                       ^~~~~~~~
                                       malloc
block_allocator.cpp:25:24: note: 'malloc' declared here
void *block_allocator::malloc(size_t size)
                       ^
2 errors generated.
ADD REPLYlink written 2.9 years ago by cts1.5k

Let's continue :)

I think this new version fixes your problem: http://colibread.inria.fr/files/2014/08/mapsembler2_2.2.1_cts_2.zip

I hope this is the last bug fix!

Pierre

ADD REPLYlink written 2.9 years ago by pierre.peterlongo600

Still some bugs but compiled with the following modifications:

1. added the <cstdlib> header include to kissreads_graph/block_allocator.cpp so that free and malloc were found

2. removed reference to OpenMP in the kissreads Makefile - the code itself accounted for a lack of OpenMP but the cflags were always set in the Makefile

ADD REPLYlink written 2.9 years ago by cts1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour