Question: Recursive Pairwise Alignments
0
gravatar for Whetting
5.1 years ago by
Whetting1.5k
Bethesda, MD
Whetting1.5k wrote:

Hi guys,
I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
thanks

pairwise sequence alignment • 1.6k views
ADD COMMENTlink modified 5.1 years ago by a.zielezinski7.5k • written 5.1 years ago by Whetting1.5k

Why not use ssearch from FASTA3 or swat from phrap?

ADD REPLYlink written 5.1 years ago by lh330k
9
gravatar for a.zielezinski
5.1 years ago by
a.zielezinski7.5k
a.zielezinski7.5k wrote:

In can be easily done in Python.

from itertools import combinations
from Bio import SeqIO
from Bio import pairwise2

seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta'))

for sr1, sr2 in combinations(seqs, 2):
    aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq))[0]
    print sr1, sr2, aln[2]/float(aln[4])*100

Say file.fasta contains 3 fasta records.

>seq1
ATGCTGATGATG
>seq2
AGTCGCTGATGATAGAATAGATAGGA
>seq3
ATGCTGATGATG

Then, the output is:

seq3 seq2 46.1538461538
seq3 seq1 100.0
seq2 seq1 46.1538461538
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by a.zielezinski7.5k

thanks, it seems to be working well for the example you provided. however, Running two 1500bp alignments causes my entire machine to freeze. i did not expect pairwise2 do be this memory intensive...

ADD REPLYlink written 5.1 years ago by Whetting1.5k

For two sequences of length 1500bp, typical implementation requires 1500x1500=2.25MB memory. Even a careless implementation will not use more than 3x4x2.25MB~30MB. This is the memory if you use the programs I recommended. The core of pairwise2 is implemented in C. I would not expect RAM to be a problem. Nonetheless, I do not use pairwise2. What I said above may not be applicable to it.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by lh330k

thanks. I took a.zielinski's framework and edited it to use muscle alignments. It works pretty well. I will post my code when I get home tonight! I agree, I am surprised at the memory use of pairwise2. However, both on my PC at work and my mac at home it eats up all memory and takes forever...

ADD REPLYlink written 5.1 years ago by Whetting1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 877 users visited in the last hour