Filtering alignment results (Biopython-pairwise2)
1
0
Entering edit mode
9 weeks ago
Mustafa ▴ 10

I have a reference sequence and a multiple fasta file. I'm trying to align each sequence in the file with my ref seq. My purpose is to find the ones with deletion mutations. All sequences are in the same length with ref seq. (I believe there are some insertions too). So I can't filter them by length.

I want to filter sequences with mutations by their score. For example, I need the ones with scores of less than 1000. I want to toss them out from my sequences.

Here's my alignment code:

from Bio import SeqIO
from Bio import pairwise2

ref_seq = SeqIO.parse("ref_seq.fasta",'fasta')
for i in ref_seq:
    refseq = str(i.seq)

sequences = SeqIO.parse("deneme.fasta",'fasta')

alignments = []
for i in sequences:
    seq =  str(i.seq)
    alignment = pairwise2.align.globalxx(refseq, seq, one_alignment_only=True)
    alignments.append(alignment)

The output of the alignment is like this:

Alignment(seqA='...', seqB='...', score=1269.0, start=0, end=1277)

I read the tutorial for the pairwise2 module but I couldn't find anything. How can I filter the sequences by their alignment score?

biopython pairwise2 alignment • 339 views
ADD COMMENT
1
Entering edit mode
9 weeks ago

The object returned by the alignment function is a list of "named tuples" where each "named tuple" has the attributes score, start, end, seqA and seqB. Use it like so:

from Bio import pairwise2

alns = pairwise2.align.globalxx("ACCGT", "ACG")

for aln in alns:
    print(aln.score, aln.start, aln.end, aln.seqA, aln.seqB)

prints:

3.0 0 5 ACCGT A-CG-
3.0 0 5 ACCGT AC-G-
ADD COMMENT
0
Entering edit mode

Oh.. How couldn't I think that... Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6