Question: Recursive Pairwise Alignments
0
8.1 years ago by
Whetting1.5k
Bethesda, MD
Whetting1.5k wrote:

Hi guys,
I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
thanks

pairwise sequence alignment • 2.5k views
modified 8.1 years ago by a.zielezinski9.4k • written 8.1 years ago by Whetting1.5k

Why not use ssearch from FASTA3 or swat from phrap?

9
8.1 years ago by
a.zielezinski9.4k
a.zielezinski9.4k wrote:

In can be easily done in Python.

``````from itertools import combinations
from Bio import SeqIO
from Bio import pairwise2

seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta'))

for sr1, sr2 in combinations(seqs, 2):
aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq))[0]
print sr1, sr2, aln[2]/float(aln[4])*100
``````

Say `file.fasta` contains 3 fasta records.

``````>seq1
ATGCTGATGATG
>seq2
AGTCGCTGATGATAGAATAGATAGGA
>seq3
ATGCTGATGATG
``````

Then, the output is:

``````seq3 seq2 46.1538461538
seq3 seq1 100.0
seq2 seq1 46.1538461538
``````

thanks, it seems to be working well for the example you provided. however, Running two 1500bp alignments causes my entire machine to freeze. i did not expect pairwise2 do be this memory intensive...

For two sequences of length 1500bp, typical implementation requires 1500x1500=2.25MB memory. Even a careless implementation will not use more than 3x4x2.25MB~30MB. This is the memory if you use the programs I recommended. The core of pairwise2 is implemented in C. I would not expect RAM to be a problem. Nonetheless, I do not use pairwise2. What I said above may not be applicable to it.

thanks. I took a.zielinski's framework and edited it to use muscle alignments. It works pretty well. I will post my code when I get home tonight! I agree, I am surprised at the memory use of pairwise2. However, both on my PC at work and my mac at home it eats up all memory and takes forever...