Pairwise alignment of multi-FASTA file sequences
0
0
Entering edit mode
2.2 years ago
aurora • 0

I have multi-FASTA file containing more than 10 000 fasta sequences and I want to do pairwise alignment of each sequence to each sequence inside the file and store all the results in the same new file in order to perform clustering analysis after. My code for performing pairwise sequence alignment with python is written below and I am wondering how can I modify it to loop over whole multi-FASTA file and store results as needed.

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

X = "ACGGGT"
Y = "ACG"

#A match score = 2, mismatch score = -1, gap opening = -5, gap extension = -2
alignments = pairwise2.align.globalms(X, Y, 2, -1, -5, -2)

for a in alignments:
print(format_alignment(*a))

alignment next-gen sequence fasta pairwise • 1.2k views
0
Entering edit mode

Have you had a look at itertools? https://docs.python.org/3.6/library/itertools.html I imagine this will help get you on the right track, but also this could be quite slow.