Recursive Pairwise Alignments
1
0
Entering edit mode
8.9 years ago
Whetting ★ 1.5k

Hi guys,
I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
thanks

pairwise sequence alignment • 2.7k views
0
Entering edit mode

Why not use ssearch from FASTA3 or swat from phrap?

9
Entering edit mode
8.9 years ago

In can be easily done in Python.

from itertools import combinations
from Bio import SeqIO
from Bio import pairwise2

seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta'))

for sr1, sr2 in combinations(seqs, 2):
aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq))[0]
print sr1, sr2, aln[2]/float(aln[4])*100


Say file.fasta contains 3 fasta records.

>seq1
ATGCTGATGATG
>seq2
AGTCGCTGATGATAGAATAGATAGGA
>seq3
ATGCTGATGATG


Then, the output is:

seq3 seq2 46.1538461538
seq3 seq1 100.0
seq2 seq1 46.1538461538

0
Entering edit mode

thanks, it seems to be working well for the example you provided. however, Running two 1500bp alignments causes my entire machine to freeze. i did not expect pairwise2 do be this memory intensive...

0
Entering edit mode

For two sequences of length 1500bp, typical implementation requires 1500x1500=2.25MB memory. Even a careless implementation will not use more than 3x4x2.25MB~30MB. This is the memory if you use the programs I recommended. The core of pairwise2 is implemented in C. I would not expect RAM to be a problem. Nonetheless, I do not use pairwise2. What I said above may not be applicable to it.

0
Entering edit mode

thanks. I took a.zielinski's framework and edited it to use muscle alignments. It works pretty well. I will post my code when I get home tonight! I agree, I am surprised at the memory use of pairwise2. However, both on my PC at work and my mac at home it eats up all memory and takes forever...