A Pwm With Gapped Alignments In Biopython
2
3
Entering edit mode
12.7 years ago
RossCampbell ▴ 140

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()
biopython multiple clustalw • 5.0k views
ADD COMMENT
0
Entering edit mode

Here's an example how to do it with with unambiguous_dna:

m = motifs.create(list, alphabet=Gapped(IUPAC.unambiguous_dna))

ADD REPLY
5
Entering edit mode
12.7 years ago

Paul-Michael's answer is exactly right and you were nearly there with Gapped. Just pass the same alphabet to AlignIO and Motif; here is a working example with a protein alignment:

from Bio import AlignIO, Motif
from Bio.Alphabet import IUPAC, Gapped
alphabet = Gapped(IUPAC.protein)
alignment = AlignIO.read("cw02.aln", "clustal", alphabet=alphabet)
m = Motif.Motif(alphabet)
for a in alignment:
    m.add_instance(a.seq)
print m.pwm()[0]

With a DNA alignment you'd want IUPAC.unambiguous_dna.

ADD COMMENT
2
Entering edit mode
12.7 years ago
Agapow ▴ 270

Just quickly: the proximate problem here is that the alphabets of the Motif and and aligned sequence being added have to be the same (see line 44 of _Motif.py). So you could create the motif with the same alphabet, or perhaps try to set it to None, which with avoid the alphabet comparison.

Biopython Alphabets are a tricky subject - I regularly have to pore over the source code to understand how they work, only to forget it all by the next time something odd happens.

ADD COMMENT
1
Entering edit mode

Which lead me to finally writing down all my notes on Alphabets: http://www.agapow.net/science/bioinformatics/alphabets-in-biopython

ADD REPLY

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6