Question: Looking for a local alignment library with certain features
0
gravatar for juan.crescente
2.3 years ago by
juan.crescente40 wrote:

I'm looking for a library that performs local alignment with the next features:

1- Returns the start and end position of query and subject

2- Returns more than one alignment

3- Do not require indexes or many I/O operations to run (time efficiency)

At the time I've found that:

scikit-bio -> StripedSmithWaterman and local_pairwise_align_ssw Only returns one alignment (the optimal)

biopython -> pairwise2 Does not return start and end position

https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library Does not return target begin (even they're claiming so in the docs)

https://github.com/mbreese/swalign Only returns one alignment (the optimal)

Bioconductor returns the optimal and no positions

library(Biostrings)
mat <- nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = TRUE)
res1 <- pairwiseAlignment(pattern = "ACGT",subject ="ACT",gapOpening = 10.5, gapExtension = .5,substitutionMatrix = mat)

What I need is to find local alignments between two equal-length short sequences (~1500 bp) I'm using python at the time for programming the wrapper.

I'm using BLAST with word_size = 7, since this operation needs to be perfomed several times, it is not the best tool. One thing is the anount of I/O operations required to create query and subject files for BLAST communication.

BLAT with parameters -out=blast8 -oneOff=1 minScore=0 minIdentity=0 tileSize=6, reports much less results than BLAST

localalignment alignment • 627 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by juan.crescente40

How about bowtie (or any of the popular NGS aligners). They return all what you asked for.

ADD REPLYlink written 2.3 years ago by ATpoint36k

What I need is to find local alignments between two equal-length short sequences (~1500 bp). I read that bowtie is meant for aligning short sequences in large genomes, but worth the try

ADD REPLYlink written 2.3 years ago by juan.crescente40

What is your goal? What do you want to achieve by this?

ADD REPLYlink written 2.3 years ago by a.zielezinski9.2k

I want to find imperfect inverted repeats in sequences of about (~1500 bp) by local aligning with its reverse complementary

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by juan.crescente40
1

There are tools for that (e.g. einverted from EMBOSS package). Btw, as of BLAST, you can reduce the I/O operations regarding creating/deleting input FASTA files by providing your query and subject sequences directly from command line: blastn -query <(echo ">seq1"; echo "ATGC";) -subject <(echo ">seq1"; echo "ATGC";) -evalue 10000 -word_size 4.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by a.zielezinski9.2k

I've tried to use this approach with Popen

'-query','<(echo ">q"; echo "' + seq + '";)',
'-subject','<(echo ">s"; echo "' + seq_rc + '";)',

but all I get is, BLASTN error: Command line argument error: Argument "subject". File is not accessible: `<(echo ">s"; echo...

ADD REPLYlink written 2.3 years ago by juan.crescente40

I'm guessing you are using subprocess module. I think it would only go with os module. Btw, did you check this einverted software? If you on Ubuntu/Debian, the software is in repository, which is called emboss (apt-get install emboss). I think the program is what you are looking for.

ADD REPLYlink written 2.3 years ago by a.zielezinski9.2k

What is the name of doing this: <(echo ">q"; echo "' + seq + '";)???

ADD REPLYlink written 2.3 years ago by juan.crescente40

I think it is called command substition. (cmd) just runs the command, so it prints two echo that display your sequence as string and the sign < makes the sequence as if it came from the file.

ADD REPLYlink written 2.3 years ago by a.zielezinski9.2k

einverted output is very hard to parse for me yet. Also std in/out is not very clear.

Still having some issues using os.popen(cmd).read() sh: -c: line 0: syntax error near unexpected token `('

If I run the same command (cmd variable) in the shell, I got a valid response.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by juan.crescente40

Use os.system(cmd). But you will have to save results to a file (blastn ... -out output.txt) and then open file in Python and read it. So still you may have some I/O issues. If you decided to change your mind to try einverted I would help you with the parsing part.

ADD REPLYlink written 2.3 years ago by a.zielezinski9.2k
0
gravatar for mrals89
2.3 years ago by
mrals8950
United States
mrals8950 wrote:

From your comment to @ATpoint, it seems like you're looking for 'long' read alignment algorithms. I'd suggest BWA-mem for that (it seems to be the conventional wisdom).

But OP, it's not clear why you 'cant have indexes or many I/O operations.' Because BLAST uses indices!! I can't tell if you're trying to build a Lambda alignment function, or if you have time constraints and need to run a massive amount of 1k-5k paired alignments in parallel? Or if you're building a web application? Or if you need output in a certain format?

If you want multiple alignments to a single sequence, BLAST is not optimal, BLAT is reasonable, bowtie/bwa/hisat/etc will all give multiple alignments and query coordinates in SAM format, but they all require indices and and default parameters aren't optimal for long read alignments.

ADD COMMENTlink written 2.3 years ago by mrals8950
0
gravatar for h.mon
2.3 years ago by
h.mon30k
Brazil
h.mon30k wrote:

I think lastz fits your needs.

ADD COMMENTlink written 2.3 years ago by h.mon30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1709 users visited in the last hour