Question

Recursive Pairwise Alignments

0

Entering edit mode

11.4 years ago

Whetting ★ 1.6k

Hi guys,
I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
thanks

pairwise sequence alignment • 3.2k views

ADD COMMENT • link updated 11.4 years ago by Andrzej Zielezinski 11k • written 11.4 years ago by Whetting ★ 1.6k

0

Entering edit mode

Why not use ssearch from FASTA3 or swat from phrap?

ADD REPLY • link 11.4 years ago by lh3 33k

score 9 · Answer 1 · 2012-12-10

9

Entering edit mode

11.4 years ago

Andrzej Zielezinski 11k

In can be easily done in Python.

from itertools import combinations
from Bio import SeqIO
from Bio import pairwise2

seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta'))

for sr1, sr2 in combinations(seqs, 2):
    aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq))[0]
    print sr1, sr2, aln[2]/float(aln[4])*100

Say file.fasta contains 3 fasta records.

>seq1
ATGCTGATGATG
>seq2
AGTCGCTGATGATAGAATAGATAGGA
>seq3
ATGCTGATGATG

Then, the output is:

seq3 seq2 46.1538461538
seq3 seq1 100.0
seq2 seq1 46.1538461538

ADD COMMENT • link 11.4 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

thanks, it seems to be working well for the example you provided. however, Running two 1500bp alignments causes my entire machine to freeze. i did not expect pairwise2 do be this memory intensive...

ADD REPLY • link 11.4 years ago by Whetting ★ 1.6k

0

Entering edit mode

For two sequences of length 1500bp, typical implementation requires 1500x1500=2.25MB memory. Even a careless implementation will not use more than 3x4x2.25MB~30MB. This is the memory if you use the programs I recommended. The core of pairwise2 is implemented in C. I would not expect RAM to be a problem. Nonetheless, I do not use pairwise2. What I said above may not be applicable to it.

ADD REPLY • link 11.4 years ago by lh3 33k

0

Entering edit mode

thanks. I took a.zielinski's framework and edited it to use muscle alignments. It works pretty well. I will post my code when I get home tonight! I agree, I am surprised at the memory use of pairwise2. However, both on my PC at work and my mac at home it eats up all memory and takes forever...

ADD REPLY • link 11.4 years ago by Whetting ★ 1.6k