Question: Semi-Global Alignment Tool?
5
gravatar for Ryan Thompson
3.6 years ago by
Ryan Thompson2.4k
TSRI, La Jolla, CA
Ryan Thompson2.4k wrote:

Does anyone know of any general-purpose semiglobal alignment tools? Something like BLAST for semiglobal alignments instead of local alignments.

A semiglobal alignment is like a global alignment, but penalty-free gaps are allowed at the beginning and end of the alignment. See Wikipedia for a bit more information on semiglobal alignments.

Edit: It has come to my attention that the term "semiglobal alignment" is an ambiguous; it is used to describe several different types of alignment. What I am looking for is a global alignment with no penalty for gaps at the sequence ends.

I want to use ends-free alignment to find all occurrences of a particular sequence in a full lane of Illumina readsI want to mask a short (42 bp) sequence from a lane of paired-end 100 nt Illumina reads. The sequence is expected to occur anywhere within any read with equal probability, including a partial overlap on either end, and if it appears in the middle of a read, it has to be the whole sequence. So I need to do an ends-free alignment of the short sequence against each read independently.

ADD COMMENTlink modified 3.5 years ago by Manuel160 • written 3.6 years ago by Ryan Thompson2.4k

Would it be reasonable to use something like BLAST or any other heuristic local alignment/mapping program, and then post-process the results to filter out hits that are not end-free alignments?

ADD REPLYlink written 3.5 years ago by Ryan Thompson2.4k
8
gravatar for brentp
3.6 years ago by
brentp17k
Denver, Colorado
brentp17k wrote:

Check here where I modified Marcin CieĊ›lik's (modification of my) code to do various alignments including "glocal"--a combination of global and local--that does what you want. it's a python/cython module.

you should be able to install with:

git clone git://github.com/brentp/align.git
cd align
sudo python setup.py install

and then use as:

>>> from align import aligner
>>> aligner('WW','WEWWEW', method='glocal')
('WW', 'WW')

hope that helps.

ADD COMMENTlink written 3.6 years ago by brentp17k

The setup.py install step is crashing. How do I debug that?

ADD REPLYlink written 3.5 years ago by Ryan Thompson2.4k

Ok, I figured out that setup.py produces a cryptic error if Cython is not installed. You should probably fix that.

ADD REPLYlink written 3.5 years ago by Ryan Thompson2.4k

Actually, it turns out that I am looking for the alignment mode that you call "global_cfe". Are there standard definitions for any type of alignment other than local and global?

ADD REPLYlink written 3.5 years ago by Ryan Thompson2.4k

There are several combinations it seems, like global-local or local-global.

ADD REPLYlink written 3.5 years ago by Michael Dondrup27k

Ryan Thompson, thanks for reporting install problems. Fixed as of: http://github.com/brentp/align/commit/c7fd7c16ec0cd10fc44df633dcb272ffc7dd690f

ADD REPLYlink written 3.5 years ago by brentp17k

Is there a way to return the alignment score and the start/end indices of the alignment in the original input sequences?

ADD REPLYlink written 3.0 years ago by Ryan Thompson2.4k
4
gravatar for Michael Dondrup
3.5 years ago by
Bergen
Michael Dondrup27k wrote:

The method pairwiseAlignment in the Bioconductor package Biostrings does this out of the box:

From the manual:

type - type of alignment. One of "global", "local", "overlap", "global- local", and "local-global" where "global" = align whole strings with end gap penalties, "local" = align string fragments, "overlap" = align whole strings without end gap penalties, "global-local" = align whole strings with end gap penalties on pattern and without end gap penal- ties on subject "local-global" = align whole strings without end gap penalties on pattern and with end gap penalties on subject.

The document Pairwise Sequence Alignments is a tutorial about how to do alignments with R.

ADD COMMENTlink written 3.5 years ago by Michael Dondrup27k

After noticing what you really want to do, I have my doubts that this method is fast enough for it.

ADD REPLYlink written 3.5 years ago by Michael Dondrup27k

Actually, the pairwiseAlignment is quite performant. Based on my benchmarks, I should be able to process a whole lane of Illumina data in under an hour on a 48-core server (which I have).

ADD REPLYlink written 3.1 years ago by Ryan Thompson2.4k

Also, the PDF that you link to has an example that's almost what I want to do. So thanks for that as well.

ADD REPLYlink written 3.1 years ago by Ryan Thompson2.4k
1
gravatar for Manuel
3.5 years ago by
Manuel160
Manuel160 wrote:

What exactly do you want to do?

  • Do you want to solve the read mapping problem (i.e. NGS reads against a reference genome)? Look at read mappers such as bowtie, bwa. RazerS etc.
  • Do you want to do this on a smaller Scale? The SeqAn library provides you with DP alignment algorithms that allow you to initialize the matrix borders with 0's which will give you semiglobal alignments.
ADD COMMENTlink written 3.5 years ago by Manuel160

Actually, I want to mask a short (42 bp) sequence from a lane of paired-end 100 nt Illumina reads. The sequence is expected to occur anywhere within any read with equal probability, including a partial overlap on either end. So I need to do an ends-free alignment of the short sequence against each read independently.

ADD REPLYlink written 3.5 years ago by Ryan Thompson2.4k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 623 users visited in the last hour