I have 241 nucleotide sequences (~1500bp) that I would like to calculate all pairwise sequence identities for.
I wrote a tool for this, but it keeps crashing for some reason.
Does anyone know of a (online) tool that will allow me to get this information?
In can be easily done in Python.
from itertools import combinations from Bio import SeqIO from Bio import pairwise2 seqs = SeqIO.to_dict(SeqIO.parse(open('file.fasta'),'fasta')) for sr1, sr2 in combinations(seqs, 2): aln = pairwise2.align.globalxx(str(seqs[sr1].seq), str(seqs[sr2].seq)) print sr1, sr2, aln/float(aln)*100
file.fasta contains 3 fasta records.
>seq1 ATGCTGATGATG >seq2 AGTCGCTGATGATAGAATAGATAGGA >seq3 ATGCTGATGATG
Then, the output is:
seq3 seq2 46.1538461538 seq3 seq1 100.0 seq2 seq1 46.1538461538