Question: A Pwm With Gapped Alignments In Biopython
3
gravatar for RossCampbell
7.6 years ago by
RossCampbell140
USA/Frederick
RossCampbell140 wrote:

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()
clustalw biopython multiple • 2.9k views
ADD COMMENTlink written 7.6 years ago by RossCampbell140

Here's an example how to do it with with unambiguous_dna:

m = motifs.create(list, alphabet=Gapped(IUPAC.unambiguous_dna))

ADD REPLYlink written 4.3 years ago by jykzel0
5
gravatar for Brad Chapman
7.6 years ago by
Brad Chapman9.3k
Boston, MA
Brad Chapman9.3k wrote:

Paul-Michael's answer is exactly right and you were nearly there with Gapped. Just pass the same alphabet to AlignIO and Motif; here is a working example with a protein alignment:

from Bio import AlignIO, Motif
from Bio.Alphabet import IUPAC, Gapped
alphabet = Gapped(IUPAC.protein)
alignment = AlignIO.read("cw02.aln", "clustal", alphabet=alphabet)
m = Motif.Motif(alphabet)
for a in alignment:
    m.add_instance(a.seq)
print m.pwm()[0]

With a DNA alignment you'd want IUPAC.unambiguous_dna.

ADD COMMENTlink written 7.6 years ago by Brad Chapman9.3k
2
gravatar for Agapow
7.6 years ago by
Agapow270
London, UK
Agapow270 wrote:

Just quickly: the proximate problem here is that the alphabets of the Motif and and aligned sequence being added have to be the same (see line 44 of _Motif.py). So you could create the motif with the same alphabet, or perhaps try to set it to None, which with avoid the alphabet comparison.

Biopython Alphabets are a tricky subject - I regularly have to pore over the source code to understand how they work, only to forget it all by the next time something odd happens.

ADD COMMENTlink written 7.6 years ago by Agapow270
1

Which lead me to finally writing down all my notes on Alphabets: http://www.agapow.net/science/bioinformatics/alphabets-in-biopython

ADD REPLYlink written 7.6 years ago by Agapow270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 760 users visited in the last hour