Question about pairwise2 in Biopython
1
0
Entering edit mode
4.2 years ago
KHP • 0

Hello,

I have some Sanger sequence data and a database of Illumina sequences. I want to match the Sanger Sequence to the corresponding Illumina sequence. This is the code I used:

from Bio import SeqIO
from Bio import pairwise2
from Bio import Seq

fasta_sequences = SeqIO.parse(open("taxa_all.fasta"),'fasta') ### database of Illumina sequences

with open ("isolation-round1/778291/High_Intensity/18_F.ab1.seq") as myfile:
       isolate=myfile.readline()                              ### the Sanger sequence

score=[]
name=[]

for fasta in fasta_sequences:
    n,sequence = fasta.id, str(fasta.seq)
    for a in pairwise2.align.localxx(isolate,sequence):
        al1,al2,s,begin,end=a
        score.append(s)
        name.append(n)

When I ran this code, which took about 5 min, the scores I obtained were very close to each other, and I was therefore unable to say with certainty what the correct mapping was.

So then I changed it to

for a in pairwise2.align.localms(isolate,sequence,2, -1, -.5, -.1):

keeping everything else the same. This code took about 3h to run! On examining the results, I noticed that there were a 1000 repeats of each name and score. But I don't understand why. Am I doing something wrong? Is there a way to speed up the process?

Biopython pairwise2 • 659 views
ADD COMMENT
0
Entering edit mode
4.2 years ago
KHP • 0

Looks like it was the position of the append arguments that was the problem! The two append functions must be outside the inner for loop.

ADD COMMENT

Login before adding your answer.

Traffic: 1872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6