Question

Biopython Bio.motifs: How to create a motif object with aligned sequences

0

Entering edit mode

4.6 years ago

kinetic • 0

I'm following this Biopython tutorial. Where the tutorial uses DNA "instances" to create a motif, I need to use an aligned fasta.

I tried

alphabet = Gapped(IUPAC.protein) 
alignment = AlignIO.read("my_seqs.afa", "fasta", alphabet=alphabet)
m = motifs.create(alignment)

but that results in

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/m/.local/lib/python2.7/site-packages/Bio/motifs/__init__.py", line 24,
in create
return Motif(instances=instances, alphabet=alphabet)
File "/home/m/.local/lib/python2.7/site-packages/Bio/motifs/__init__.py", line 273
, in __init__
counts = self.instances.count()
File "/home/m/.local/lib/python2.7/site-packages/Bio/motifs/__init__.py", line 220
, in count
for letter in self.alphabet:
TypeError: 'NoneType' object is not iterable

I looked through the documentation, but I can't find anything that specifies the expected input for motifs.create. Does it not work with aligned sequences or am I just reading them in incorrectly?

python motif alignment • 2.6k views

ADD COMMENT • link updated 4.6 years ago by Eric Lim ★ 2.1k • written 4.6 years ago by kinetic • 0

score 0 · Answer 1 · 2019-09-13

The link you referenced above indicates Bio.motifs.create takes a list of Seq instances. AlignIO gives you a list of SeqRecord and each of those records has a Seq object. So, [x.seq for x in alignment] is what you need to provide in motifs.create.

See below for a working example.

[~/Downloads/tmp]$ cat test.fa 
>1
AGCTAGCG
>2
GTCGAGCC
>3
GTAGCGCG

[~/Downloads/tmp]$ ipython
In [1]: from Bio import AlignIO                                                                                                                     

In [2]: from Bio import motifs                                                                                                                      

In [3]: alignment = AlignIO.read("test.fa", "fasta")                                                                                                

In [4]: m = motifs.create([x.seq for x in alignment])                                                                                               

In [5]: m.consensus                                                                                                                                 
Out[5]: Seq('GTCGAGCG', IUPACUnambiguousDNA())

In [6]: m.counts                                                                                                                                    
Out[6]: 
{'G': [2, 1, 0, 2, 0, 3, 0, 2],
 'A': [1, 0, 1, 0, 2, 0, 0, 0],
 'T': [0, 2, 0, 1, 0, 0, 0, 0],
 'C': [0, 0, 2, 0, 1, 0, 3, 1]}