biopython script to alingn mutiple sequences
1
0
Entering edit mode
3.5 years ago
anasjamshed ▴ 120

this script is supposed to loop through all the sequences present in CSV file and make pairwise alignment with all of them :

from Bio.SubsMat import MatrixInfo as matlist
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
from Bio.Seq import Seq
import pandas as pd


main_df = pd.read_csv('seq_test2.csv')
seq_db = main_df['Sequence'].values


seq1 = 'CTCTACATTCACTTA'
matrix = matlist.blosum62


sequences = []
for seq in seq_db:
    sequences.append(seq)

for seq in sequences:
    seq = Seq(seq)
    align_query = pairwise2.align.globalds(seq1,seq,
                                            matrix, -1, -0.5,
                                               gap_char ='-' ,
                                               one_alignment_only = True)
    print(format_alignment(*align_query[0]))

but it gives the following errors :

KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\Bio\pairwise2.py in __call__(self, charA, charB)
   1287             charB, charA = charA, charB
-> 1288         return self.score_dict[(charA, charB)]
   1289 

KeyError: ('J', 'A')

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: PyEval_EvalFrameEx returned a result with an error set

plz help me to remove this error

python biopython • 987 views
ADD COMMENT
0
Entering edit mode
3.5 years ago

Verify the sequences that you are loading from the file.

It appears the sequence you are loading has a J in it.

Since J is not a valid code for neither DNA nucleotides nor as an aminoacid code hence does not have an entry in the scoring matrix.

ADD COMMENT
0
Entering edit mode

yes my sequence contains J

ADD REPLY
0
Entering edit mode

what does J mean?

I believe the biopython aligner could take a J as long as your scoring matrix contains the scores for matches and penalities.

Thus if you really want to align a J to an A then generate a custom scoring matrix and load that into the aligner.

ADD REPLY

Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6