Question

Biopython - Calculation Used In Motif Building

0

Entering edit mode

11.7 years ago

Olivier ▴ 440

I was trying to build a motif using Biopython 1.58 and I tried to cross-check the amino acid frequency from one column out of curiosity. Why do the values not correspond to the 'exact' value? e.g. given for 1st column there are amino acids: N,S,P,S,V,P,S,N,I,I,I,H,V,V,L. The output of the motif object is: {... 'N': 0.1279761904761905, 'P': 0.1279761904761905, 'S': 0.19047619047619052, ...}. Shoudn't 'N' be 2/15 = 0.06667, 'P' be 2/15 = 0.1333, and 'S' be 3/15 = 0.2? May be I misunderstood, or is there some other transformation done to the amino acid frequency used to build the sequence motif? Grateful if someone could clarify.

Thanks

biopython motif • 3.8k views

ADD COMMENT • link updated 11.7 years ago by Neilfws 49k • written 11.7 years ago by Olivier ▴ 440

0

Entering edit mode

According to your example, N should also be 2/15 and indeed has the same value as P. We also need more details such as: which Biopython module was used?

ADD REPLY • link 11.7 years ago by Neilfws 49k

0

Entering edit mode

You're right. My mistake. Thanks. I used the Bio.Motif module.

ADD REPLY • link 11.7 years ago by Olivier ▴ 440

0

Entering edit mode

Further to Neil's helpful comment, could you actually show us the code you are using with Biopython to get these numbers?

ADD REPLY • link 11.7 years ago by Peter 6.0k

1

Entering edit mode

Here's the script. Still learning..

from Bio.Motif import Motif
from Bio.Alphabet import IUPAC, Gapped
from Bio.Align.Applications import MuscleCommandline
from Bio import SeqIO, AlignIO, Entrez
import os

#preparing script for communicating with NCBI's Entrez 
os.environ['http_proxy'] = 'http://130.14.29.110:80'
Entrez.email = 'myemailaddress'

#searching for wrky sequence
search_handle = Entrez.esearch(term='wrky AND (tomato[orgn] OR potato[orgn] OR pepper[orgn] OR arabidopsis[orgn] OR)',
                        db='protein')

id_list = Entrez.read(search_handle)['IdList'][:15]

#Retrieving sequences
gb_handle = Entrez.efetch(db='protein', id=id_list, rettype='gb')

fasta_records = SeqIO.convert(gb_handle, 'gb', '15_wrky.fas', 'fasta')

#Aligning sequences
muscle_cline = MuscleCommandline(input='/home/olivier/15_wrky.fas',
                             out='/home/olivier/15_wrky.aln',
                             clwstrict=True)
stdout, stderr = muscle_cline()

#Loading the msa and building a sequence motif
alphabet = Gapped(IUPAC.protein)
alignment_file = AlignIO.read('/home/olivier/15_wrky.aln','clustal',
                              alphabet=alphabet)
m = Motif(alphabet=alphabet)

for sequence in alignment_file:
    m.add_instance(sequence.seq[367:430])

print(m.pwm())

ADD REPLY • link updated 11.7 years ago by Neilfws 49k • written 11.7 years ago by Olivier ▴ 440

0

Entering edit mode

I edited this by indenting lines of code with 4 spaces (to display code properly).

ADD REPLY • link 11.7 years ago by Neilfws 49k

Neilfws · Answer 1 · 2012-07-30

2

Entering edit mode

11.7 years ago

Neilfws 49k

is there some other transformation done to the amino acid frequency used to build the sequence motif

I believe that the answer is "yes" and a clue as to the transformation is found in the source code for the pwm method:

if laplace=True (default), pseudocounts equal to self.background multiplied by self.beta are added to all positions.

ADD COMMENT • link 11.7 years ago by Neilfws 49k

1

Entering edit mode

That text is in the docstring, so doing help(m.pwm) or help(m) would have told you this (where m is a Motif object as in the example).

ADD REPLY • link 11.7 years ago by Peter 6.0k

0

Entering edit mode

Thanks Peter. I had not thought of it.

ADD REPLY • link 11.7 years ago by Olivier ▴ 440

0

Entering edit mode

If you find docstrings missing important information please let us know or suggest improvements. Thanks!

ADD REPLY • link 11.7 years ago by Peter 6.0k

0

Entering edit mode

i.e. If you want raw counts, use:

m.pwm(laplace=False)

ADD REPLY • link updated 11.7 years ago by Neilfws 49k • written 11.7 years ago by Peter 6.0k

0

Entering edit mode

Thanks a lot Peter for clarifying. I'm still going through the Biopython tutorial as a preparation for my research project. Hope that my questions are not too basic, as I'm from a molecular biology background.

ADD REPLY • link 11.7 years ago by Olivier ▴ 440