Question

Filtering alignment results (Biopython-pairwise2)

0

Entering edit mode

21 months ago

M. ▴ 30

I have a reference sequence and a multiple fasta file. I'm trying to align each sequence in the file with my ref seq. My purpose is to find the ones with deletion mutations. All sequences are in the same length with ref seq. (I believe there are some insertions too). So I can't filter them by length.

I want to filter sequences with mutations by their score. For example, I need the ones with scores of less than 1000. I want to toss them out from my sequences.

Here's my alignment code:

from Bio import SeqIO
from Bio import pairwise2

ref_seq = SeqIO.parse("ref_seq.fasta",'fasta')
for i in ref_seq:
    refseq = str(i.seq)

sequences = SeqIO.parse("deneme.fasta",'fasta')

alignments = []
for i in sequences:
    seq =  str(i.seq)
    alignment = pairwise2.align.globalxx(refseq, seq, one_alignment_only=True)
    alignments.append(alignment)

The output of the alignment is like this:

Alignment(seqA='...', seqB='...', score=1269.0, start=0, end=1277)

I read the tutorial for the pairwise2 module but I couldn't find anything. How can I filter the sequences by their alignment score?

biopython pairwise2 alignment • 707 views

ADD COMMENT • link 21 months ago by M. ▴ 30

score 1 · Accepted Answer · 2022-07-29

1

Entering edit mode

21 months ago

Istvan Albert 100k

The object returned by the alignment function is a list of "named tuples" where each "named tuple" has the attributes score, start, end, seqA and seqB. Use it like so:

from Bio import pairwise2

alns = pairwise2.align.globalxx("ACCGT", "ACG")

for aln in alns:
    print(aln.score, aln.start, aln.end, aln.seqA, aln.seqB)

prints:

3.0 0 5 ACCGT A-CG-
3.0 0 5 ACCGT AC-G-