Tool:Mapsembler2 targeted micro assembly and visualization of the local assembly graph
4
5
Entering edit mode
8.3 years ago

Dear all,

I'm please to introduce the Mapsembler2 tool.

Mapsembler2 is a targeted assembly software. It takes as input any number of NGS raw read set(s) (fasta or fastq, gzipped or not) and a set of input sequences (starters). For each starter, Mapsembler2 outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Mapsembler2 may be used for (not limited to):

• Validate an assembled sequence (input as starter), e.g. from a de Bruijn graph assembly where read-coherence was not enforced. Checks if a known enzyme is present in a metagenomic NGS read set.
• Enrich unmappable reads by extending them, possibly making them mappable
• Check what happens at the extremities of a contig
• Check the presence / absence and quantify RNA seq splicing events. Check the presence/absence of SNPs or structural variants, ...

Based on the Minia data-structure it has a tiny memory footprint (human read sets can be analyzed with no more than 6GB memory) while being faster than other mentioned tools.

Finally we put effort to make it simple. The micro assembly step is in command-line fashion. We made it as simple as possible as shown in our dedicated video. Another video presents the graphical interface usage.

Any comment/feedback is warmly welcome.

Best,
Pierre

minia mapsembler next-gen Assembly • 4.1k views
0
Entering edit mode

I couldn't find a git repository for mapsembler2. I had a minor pull request I wanted to submit (i.e. finish renaming run_mapsembler_and_phaser.sh to run_mapsembler2_pipeline.sh in the documentation and usage info). Thanks so much for developing mapsembler2!

0
Entering edit mode

Hey, Thanks for your suggestion. I may do the modification. However, I'm afraid that mapsembler would deserve a much deeper code update and review. Many users complain that compilation fails, depending on their OS and GCC version.

Anyone interested in the maintenance and update of the code is warmly welcome to take the helm on the project.

Best, Pierre

2
Entering edit mode
8.2 years ago

Hi all,

We propose a new version of mapsembler2 that should hopefully fix compilation problems.

This new version is available from the web page: http://colibread.inria.fr/mapsembler2/ or directly here: http://www.irisa.fr/symbiose/people/ppeterlongo/mapsembler2_2.2.3.zip

Don't hesitate to continue to raise new issues and to comment the tool.

Pierre

1
Entering edit mode
8.3 years ago
cts ★ 1.7k

I also fails to install on a linux server running RHEL, which has by default a pretty old version of gcc (4.4.7). I've tried to install a local and updated version of gcc (4.6.3) however there is something about your make files that don't recognize a non-standard location for g++. For example:

\$ which g++
/opt/gcc/4.6.3/bin/g++

But then at the beginning of the cmake output I get the following, which shows that it is still using the system g++, which appears to be too old for the software:

#######################################################################
################## COMPILE MAPSEMBLER2 EXTREMITIES ####################
#######################################################################
-- The C compiler identification is GNU 4.4.7
-- The CXX compiler identification is GNU 4.4.7
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
0
Entering edit mode

Hello

Thanks for your interest in mapsembler. I'm sorry for these compilation problems.

I don't know while cmake is looking for 4.4.7 compiler. A temporary response could be to set the CXX environment variable before running the script ./compile_all_tools.sh

CXX=/opt/gcc/4.6.3/bin/g++
./compile_all_tools.sh


Pierre

2
Entering edit mode
7.9 years ago
shelkmike ▴ 830

Hi Pierre!

Mapsembler is a useful program, I use it since it's first version (my paper citing it will be published soon). However, I think, mapsembler needs a more detailed manual, describing more thoroughly its parameters and algorithm.

1. How does a choice of an algorithm of a graph traversal (breadth or depth) affect program performance and results?
2. What is a 'maximal length of nodes' and a 'maximal depth of nodes'? If length is a length in base pairs, why the default is only 40 bp?

And, also, it would be very great if you add somehow an ability to use paired-end information. Not all paired-end reads can be merged by their overlapping ends, many have a long insert size and the use of such pairs (especially in a case of mate-pair reads) would greatly increase the abilities of mapsembler. An algorithm of paired-end extremities extension can be taken, for example, from Gapfiller or PRICE.

Thank you!

1
Entering edit mode

Hi,

You're absolutely right, we need to work on the documentation. More generally, mapsembler2 is quite different from mapsembler and we should prepare a new publication.

1. How does a choice of an algorithm of a graph traversal (breadth or depth) affect program performance and results?

During the graph traversal we had to stop the computations after a while, else the whole graph may be explored and output, which is too time consuming and is useless from a user point of view. The criteria we apply is to stop the exploration after exploring n nodes. This is done either in depth or breadth first fashion. We think that the breadth first is the most adapted to the mapsembler usage: it enables to visualize the whole neighborhood of each starter. On the contrary, the depth first could lead to a deep exploration of one of the branches starting from a starter, but if this branch is explored in n or more nodes, the computation stops for this starter and the other neighbors are not output.

2. What is a 'maximal length of nodes' and a 'maximal depth of nodes'? If length is a length in base pairs, why the default is only 40 bp?

I realize that the names were terribly badly chosen (in addition to the lack of documentation).

Maximal length of node should be "maximal number of nodes": in a extension, we limit the number of visited nodes (in order to maintain a readable graph).

Maximal depth of nodes is the maximal length (in term of nucleotides) of a path from the starter to the end of the path. This limit the depth of the recursion. This is more a computational criteria

Paired reads: yes this is a nice idea. We need time to check those possibilities.

0
Entering edit mode
8.3 years ago
cts ★ 1.7k

I tried to install just now on my iMac 10.9 using clang 5 and I get a number of errors related to the tr1 namespace. It would appear that on my system this namespace and the tr1 header files don't exist. Perhaps there is a robust solution that you can implement in the cmake configuration to deal with systems that still use the tr1 namespace vs compilers that no longer use it.

0
Entering edit mode

Thanks a lot for this remark.

I've tried to fix the problem that is indeed related to clang. I've prepared a new version fixing this issue (hopefully). You can find it here:

Would you please try this version and come back to me?

Thanks, Pierre

0
Entering edit mode

It got further this time but failed for different reasons:

#######################################################################
#######################################################################
g++ -lz -o block_allocator.o -c block_allocator.cpp -O3 -lz -DMINIA_IS_IN_PARENT_FOLDER
clang: warning: -lz: 'linker' input unused
clang: warning: -lz: 'linker' input unused
block_allocator.cpp:14:5: error: no type named 'free' in the global namespace
~~^
block_allocator.cpp:33:26: error: no member named 'malloc' in the global namespace; did you mean simply 'malloc'?
char *buffer = (char *)::malloc(alloc_size);
^~~~~~~~
malloc
block_allocator.cpp:25:24: note: 'malloc' declared here
void *block_allocator::malloc(size_t size)
^
2 errors generated.
0
Entering edit mode

Let's continue :)

I hope this is the last bug fix!

Pierre

0
Entering edit mode

Still some bugs but compiled with the following modifications:

1. added the <cstdlib> header include to kissreads_graph/block_allocator.cpp so that free and malloc were found
2. removed reference to OpenMP in the kissreads Makefile - the code itself accounted for a lack of OpenMP but the cflags were always set in the Makefile