Why s == s.reverse_complement().reverse_complement() is False ? (Biopython)
3
1
Entering edit mode
9.4 years ago
hydrofilie ▴ 10
>>> from Bio.Seq import Seq
>>> s = "ATTATATATA"
>>> s = Seq(s)
>>> s
Seq('ATTATATATA', Alphabet())
>>> s == s
True
>>> s == s.reverse_complement().reverse_complement()
False
>>> str(s) == str(s.reverse_complement().reverse_complement())
True

I do not really understand the above behaviour, can anybody please explain me in detail? Why comparing both Seq returns False while after converting back to str it is True?

Python Seq • 3.0k views
ADD COMMENT
1
Entering edit mode

What version of Biopython do you have and did it give you any warnings about this which you've not shown?

ADD REPLY
3
Entering edit mode
9.4 years ago

This is something that can be applied, in general, to other aspects of Python programming. ​Comparison operators don't usually work on the entire content of objects, but rather on some attribute of the object. For the current Biopython release that is the Seq obect's id. That is to say:

>>> from Bio.Seq import Seq
>>> x = Seq('ATTATATATA') 
>>> y = Seq('ATTATATATA') 
>>> x == y 
False

This is totally expected because:

>>> id(x)
4366014448
>>> id(y)
4365153448

The object IDs are different. This is the current behavior in the most recent Biopython release. However, the behavior will change with the next release, whenever that may be. From the equality method of Bio.Seq:

Historically comparing Seq objects has done Python object comparison. After considerable discussion (keeping in mind constraints of the Python language, hashes and dictionary support), Biopython now uses simple string comparison (with a warning about the change).

So, you should be doing string comparison now as that will work in the near future as well.

ADD COMMENT
2
Entering edit mode
I am not sure that the current behaviour is expected: '==' is the equality operator, 'is' is the identity operator. So comparing two different objects holding the same value with '==' should be True, comparing them with 'is' should be False. At least this is my understanding of Python comparisons. Not sure what '==' should return, if it compares two Seq holding the same string, but in different alphabet. Probably also False, because, technically, the sequences are different? This is however different behaviour from comparing str(s)==str(s).
ADD REPLY
0
Entering edit mode

You're right, so I've updated the beginning of my answer.

ADD REPLY
0
Entering edit mode

Coming in the next release, Biopython 1.65, after warnings for several releases.

ADD REPLY
1
Entering edit mode
9.4 years ago

My version of Biopython shows this warning when running s == s.reverse_complement().reverse_complement():

FutureWarning: In future comparing Seq objects will use string comparison (not object comparison). Incompatible alphabets will trigger a warning (not an exception). In the interim please use id(seq1)==id(seq2) or str(seq1)==str(seq2) to make your code explicit and to avoid this warning.

and indeed,

>>>str(Seq.Seq("atgc")) ==  str(Seq.Seq("atgc").reverse_complement().reverse_complement())
True

gives true, so what happens (and that is implied above) is that the object IDs are compared, which are different since reverse_complement().reverse_complement() creates a new object.

ADD COMMENT
1
Entering edit mode

It has changed in the development repository and should be in the forthcoming Biopython 1.65 release.

ADD REPLY
0
Entering edit mode

This is one place where I really like Java's use of a==b to answer the question of "Are these the same object in memory?" and a.equals(b) to answer the question of "Are these objects currently identical?"

Which question you want to answer depends on a lot of things, like whether the objects are mutable, whether identical immutable objects can exist, and speed. And it's very important for these fundamental operations to be fully defined and constant, so that changes in the language definition or implementation will not break older code.

ADD REPLY
0
Entering edit mode
Python uses '==' for equality, and 'is' for identity.
ADD REPLY
0
Entering edit mode
9.4 years ago
Dan D 7.4k

EDIT: [Removed inaccurate answer]

ADD COMMENT
2
Entering edit mode

Uh, not really...

>>> type(s)
<class 'Bio.Seq.Seq'>

>>> type(s.reverse_complement())
<class 'Bio.Seq.Seq'>

>>> str(s) == s.reverse_complement().reverse_complement()
False

ADD REPLY
0
Entering edit mode

Yes, you are correct. My knowledge of Biopython is outdated. I will update my answer accordingly.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6