Tool: Segemehl: A Fast One-Stop-Shop Mapping Tool
5
gravatar for David Langenberger
6.7 years ago by
Deutschland
David Langenberger8.8k wrote:

segemehl is a software to map sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is able to map primer or polyadenylation contaminated reads correctly. segemehl implements a matching strategy based on enhanced suffix arrays (ESA). Segemehl now supports the SAM format, reads gziped queries to save both disk and memory space and allows bisulfite sequencing mapping and split read mapping.

  • adapter prediction and/or clipping
  • mapping of single-end or paired-end data
  • mapping with mismatches, insertions and deletions
  • returning of all multiple mapping loci of one read (report only best scoring hits or all mappings with a set accuracy)
  • multiple split read mapping (and downstream splice site detection)
  • bisulfite mapping
  • multithreading

For more information see: http://hoffmann.bioinf.uni-leipzig.de/LIFE/segemehl.html

Publication: Hoffmann S, Otto C, Kurtz S, Sharma CM, Khaitovich P, Vogel J, Stadler PF, Hackermueller J: "Fast mapping of short sequences with mismatches, insertions and deletions using index structures", PLoS Comput Biol (2009) vol. 5 (9) pp. e1000502

next-gen mapping tool • 6.9k views
ADD COMMENTlink modified 4.7 years ago by enxxx23210 • written 6.7 years ago by David Langenberger8.8k
2

Interesting - I'm planning to do a "shootout" with aligners as well - I'll add this one to the list

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 80k
1

I would be highly interested in the outcome! Let me know, once you have results!

ADD REPLYlink written 6.7 years ago by David Langenberger8.8k

What is the license of SEGEMEHL? GPL?

ADD REPLYlink written 4.7 years ago by enxxx23210

As far as I know, there is no license for segemehl yet. They just write: "...free software for non-commercial use..."

ADD REPLYlink written 4.7 years ago by David Langenberger8.8k
2

I've benchmarked Segemehl with BWA-MEM, Bowtie2 and MOSAIK and found that for datasets with a lot of variation it maps more reads and with significantly greater accuracy. However, this is using default parameters, and I found that BWA responds better than Segemehl optimising mapping sensitivity in reads with high variation. In fact it generally outperforms Segemehl in terms of looser definitions of accuracy while running faster and using less memory. Yet such parameter optimisation is a pain, and people generally run these tools using defaults. Segemehl is better out of the box at exactly calling indels, although is quite a lot slower. See figure below (using CuReSimEval strict mapping definition).


 

 

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by bedeabc110

Did you also try to optimize segemehl for your dataset, or just BWA? ;)

ADD REPLYlink written 4.2 years ago by David Langenberger8.8k

Hi David, I've only just seen your reply – apologies! It was a while ago, but I did struggle with optimising Segemehl for our data through parameter sweeps. It just didn't seem to improve our results. It would be arrogant of me to suggest that our benchmark criteria were definitely not to blame for this result.

What parameters might you suggest for sensitively mapping high diversity (indel and mismatch) sequences?

ADD REPLYlink written 3.9 years ago by bedeabc110
4
gravatar for David Langenberger
6.7 years ago by
Deutschland
David Langenberger8.8k wrote:

A performance plot from the paper:

enter image description here

ADD COMMENTlink written 6.7 years ago by David Langenberger8.8k
2

that's impressive

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 80k
3
gravatar for David Langenberger
5.2 years ago by
Deutschland
David Langenberger8.8k wrote:

segemehl 2.0:

Christian Otto, Peter F. Stadler, and Steve Hoffmann: 'Lacking alignments? The next generation sequencing mapper segemehl revisited', Bioinformatics 2014 : btu146v1-btu146 (2014)

enter image description here

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by David Langenberger8.8k
2
gravatar for David Langenberger
5.3 years ago by
Deutschland
David Langenberger8.8k wrote:

Some news about the segemehl algorithm:

Steve Hoffmann, Christian Otto, Gero Doose, Andrea Tanzer, David Langenberger, Sabina Christ, Manfred Kunz, Lesca Holdt, Daniel Teupser, Jöerg Hackermüeller and Peter F Stadler: 'A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection', Genome Biology, 15:R34, doi:10.1186/gb-2014-15-2-r34 (2014)

ADD COMMENTlink written 5.3 years ago by David Langenberger8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 782 users visited in the last hour