Question: I wonder why it is so important to use Seq objects in stead of plain ol' strings in Biopyton?
 
5
 
 

This may seem like a superfluous question, and perhaps it is, but it's important to get the basic raison d'etres of the programming habits that are encouraged in the tutorial straight. (Wow that's a strange and awkward sentence but it's good to write in English and show off literal non-skills.)

In short: Why use:

    >>> from Bio.Seq import Seq
    >>> from Bio.Alphabet import IUPAC

    >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG",
        IUPAC.unambiguous_rna)
    >>> messenger_rna
        Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', 
        IUPACUnambiguousRNA())
    >>> messenger_rna.translate()
        Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))

When you can simply use:

    >>> from Bio.Seq import translate

    >>> my_string_messenger_rna = "AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG"
    >>> translate(my_string_messenger_rna)
        'MAIVMGR*KGAR*'
 
 

4 answers

 
5
 
 
 

The difference is in the type of programming one pursues.

One approach, as show in your first example is the object oriented (OO) programming where concepts such as a sequence are represented as classes with attributes, each class instance has methods that can operate on these attributes.

A second paradigm is that of functional programming (FP) where data, such as the string in your second example, is transformed via functions to produce different data.

Most programming books and formal courses emphasize the OO approach. At the risk of sounding a bit cynical I think that is most likely because it has many rules and regulations that are suited to be divided into an episodic format ... Hello everyone today we will learn about virtual private methods

With time my personal preferences have strongly shifted towards functional type programming. I think their simplicity and transparency leads to simpler solutions with fewer errors.

 
 
 

This makes sense. When it's already possible to use comments to make your statements clearer, it's seems kind of redundant to make objects out of often used concepts. Nonetheless, I will stick to the advice given by the tutorial.

log in to reply • written 2.0 years ago by Vlinxify  95
 
2

Good answer, but I would use the term "procedural programming" to describe non-OO programming in which functions are not associated with data. The term "functional programming" describes a type of stateless non-imperitive programming such like that used in R or Haskell. http://en.wikipedia.org/wiki/Functional_programming

log in to reply • written 15 months ago by Jeremy Leipzig  820823
 
 
7
 
 

I guess it's mainly for historical reasons. Since the early days of Biopython, sequences came with an associated alphabet (the IUPAC.unambiguous_rna in the example above). The advantage is that it protects against applying inappropriate operations to sequences, such as trying to translate or reverse-complement a protein sequence. In hindsight, maybe a simple string would have been better, since probably few people actually make use of the associated alphabet. While changing such a fundamental object in Biopython can be risky, I wouldn't be surprised if alphabets are removed in a future version of Biopython, or at least play a less prominent role.

 
 
 

Bioinformatics and sequence analysis seem to be one of those fields which pre-eminently are tuned for strings. It's like the hole concept of 'strings' is made for sequence analysis.

log in to reply • written 2.0 years ago by Vlinxify  95
 
 
5
 
 

Here are some more reasons why the tutorial, in particular, uses Seq objects instead of strings:

  • This chapter introduces you to the Seq and Alphabet objects, so you can explore the rest of the available methods on your own. It's not strictly meant to show the fastest way to translate a sequence; it's showing you what's available.
  • The MutableSeq and UnknownSeq objects follow the same API as Seq, but have special features that Python strings don't -- so introducing Seq also indirectly helps explain those other two types
  • The SeqRecord object uses a Seq object, and SeqIO uses SeqRecords. Showing how to create a Seq object prepares you for using SeqIO later. (Maybe this could be simplified in the future by letting SeqRecords be built with a raw string.)
 
 
 
 
5
 
 
  1. For short scripts like your example, procedural programming is often preferable, but procedural programs can become conceptually unmanageable beyond a few hundred lines. Because a string cannot hold other data like the defline/accession so when you need to write a procedural program that uses those things you end up having functions with a dozen arguments instead of one, or you write a data structure to hold all that, thereby reinventing the wheel. I tell younger programmers they should be using objects once their functions use more than 3 arguments.
  2. data validation, as mentioned
  3. an IDE cannot suggest what Bio.Seq class methods can be performed on your string
 
 
 
Log in to add a post