Biopython read alignment error
1
0
Entering edit mode
2.1 years ago
Austin • 0

Hi, I'm trying to do an alignment on 2 DNA sequences (as SeqRecord objects) of different lengths and when I print them, they are in fact valid SeqRecord objects, but I get an error: ValueError: alphabet is None; cannot interpret sequence Why am I getting this error?

Here is the code (reference_genome is defined earlier in the code)

from Bio import SeqIO
from Bio.Seq import Seq
from Bio import Align
from Bio.SeqRecord import SeqRecord

d122_fragment1 = SeqRecord(Seq('CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGATAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCACCCTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTAACATGTG'), id='p122_1', annotations={"molecule_type": "DNA"})
Seq_ref_genome = SeqRecord(Seq(reference_genome), id = 'ref', annotations={"molecule_type": "DNA"})

aligner = Align.PairwiseAligner()
alignments = aligner.align(Seq_ref_genome, d122_fragment1)
alignment = alignments[0]
print(alignment)
BioPython biopython • 607 views
ADD COMMENT
1
Entering edit mode
2.1 years ago
Wayne ★ 2.0k

Change your alignments line to the following:

alignments = aligner.align(str(Seq_ref_genome.seq), str(d122_fragment1.seq))

Admittedly that error you see isn't very informative, especially since the alphabet attribute of seq objects was removed in 2020. Looking at the documentation here, it looks like aligner.align() wants strings and not sequence records.

Example adapted from your code, the slice [:5] limits it to the top 5:

from Bio import SeqIO
from Bio.Seq import Seq
from Bio import Align
from Bio.SeqRecord import SeqRecord

d122_fragment1 = SeqRecord(Seq('CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGATAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCACCCTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTAACATGTG'), id='p122_1', annotations={"molecule_type": "DNA"})
Seq_ref_genome = SeqRecord(Seq('CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGTAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAGGGTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAAAAGAAATACACCTGGGAAATGTAACAGAAGATTTATACATGTG'), id='fakep122_1', annotations={"molecule_type": "DNA"})
aligner = Align.PairwiseAligner()
alignments = aligner.align(str(Seq_ref_genome.seq), str(d122_fragment1.seq))
for a in list(sorted(alignments))[:5]:
    print("Score = %.1f:" % a.score)
    print(a)

Output:

Score = 286.0:
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTG-TAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCA-G-G-GTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAA-AAGAAATACACCTGGGAAATGTAACAGAAGA-TTTATACATGTG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-|||||||||||||||||||||||||||||||-||||-|||||||
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGAT-AATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAC-C-C-TCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCC-ACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTA-ACATGTG

Score = 286.0:
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTG-TAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCA-G-G-GTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAA-AAGAAATACACCTGGGAAATGTAACAGAAGAT-TTATACATGTG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||-|||-|||||||
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGAT-AATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAC-C-C-TCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCC-ACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTA-ACATGTG

Score = 286.0:
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTG-TAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCA-G-G-GTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAA-AAGAAATACACCTGGGAAATGTAACAGAAGATT-TATACATGTG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-|||||||||||||||||||||||||||||||||-||-|||||||
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGAT-AATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAC-C-C-TCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCC-ACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTA-ACATGTG

Score = 286.0:
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTG-TAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCA-G-G-GTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAA-AAGAAATACACCTGGGAAATGTAACAGAAGATTT-ATACATGTG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||-|-|||||||
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGAT-AATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAC-C-C-TCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCC-ACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTA-ACATGTG

Score = 286.0:
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTG-TAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCA-G-G-GTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAA-AAGAAATACACCTGGGAAATGTAACAGAAGATTTAT-ACATGTG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||------||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||-|-||||||||||||||||||||||||||||||||||-|-|||||||
CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGAT-AATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAC-C-C-TCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCC-ACAAGAAATACACCTGGGAAATGTAACAGAAGATTT-TAACATGTG

You also may want to check out the newer pairwise aligner introduced in Biopython, pairwise2, see here and here. It is a little more pythonic and adds a nice format_alignment method.

Version related to your code:

from Bio import SeqIO
from Bio.Seq import Seq
from Bio import pairwise2
from Bio.SeqRecord import SeqRecord
from Bio.pairwise2 import format_alignment

d122_fragment1 = SeqRecord(Seq('CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGATAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCACCCTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATACACCTGGGAAATGTAACAGAAGATTTTAACATGTG'), id='p122_1', annotations={"molecule_type": "DNA"})
Seq_ref_genome = SeqRecord(Seq('CAGTGGAAATGAAAGTGATGGGGACACAAATGAATTGCTCGCACTTATGGAAATGGGGAACTTTGATCCTTGGATTGGTGTAAATTTGTAGTGCCTCAAACAACTTGTGGGTTACAGTTTATTATGGGGTTCCTGTGTGGAGAGATGCAGATACCAGGGTCTTTTGTGCATCAGATGCTAAAGCACATAAGACAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCAAAAGAAATACACCTGGGAAATGTAACAGAAGATTTATACATGTG'), id='fakep122_1', annotations={"molecule_type": "DNA"})
alignments = pairwise2.align.globalxx(str(Seq_ref_genome.seq), str(d122_fragment1.seq))
for a in alignments[:10]:
    print(format_alignment(*a))
ADD COMMENT

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6