Hi guys,
I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
thanks
Question: Recursive Pairwise Alignments
0
Whetting • 1.5k wrote:
ADD COMMENT
• link
•
modified 8.1 years ago
by
a.zielezinski ♦ 9.4k
•
written
8.1 years ago by
Whetting • 1.5k
9
a.zielezinski ♦ 9.4k wrote:
In can be easily done in Python.
from itertools import combinations
from Bio import SeqIO
from Bio import pairwise2
seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta'))
for sr1, sr2 in combinations(seqs, 2):
aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq))[0]
print sr1, sr2, aln[2]/float(aln[4])*100
Say file.fasta
contains 3 fasta records.
>seq1
ATGCTGATGATG
>seq2
AGTCGCTGATGATAGAATAGATAGGA
>seq3
ATGCTGATGATG
Then, the output is:
seq3 seq2 46.1538461538
seq3 seq1 100.0
seq2 seq1 46.1538461538
thanks, it seems to be working well for the example you provided. however, Running two 1500bp alignments causes my entire machine to freeze. i did not expect pairwise2 do be this memory intensive...
For two sequences of length 1500bp, typical implementation requires 1500x1500=2.25MB memory. Even a careless implementation will not use more than 3x4x2.25MB~30MB. This is the memory if you use the programs I recommended. The core of pairwise2 is implemented in C. I would not expect RAM to be a problem. Nonetheless, I do not use pairwise2. What I said above may not be applicable to it.
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1649 users visited in the last hour
Why not use ssearch from FASTA3 or swat from phrap?