Question: Can Biopython Test For Equality Sequences With Ambiguity Codes
0
gravatar for tflutre
4.6 years ago by
tflutre450
tflutre450 wrote:

I have multiplexed data from a RADseq experiment (digestion with ApeKI). Each sample has a tag. For each read, I want to check if the tag occurs next to what remains of the ApeKI motif after the cut. For instance, individual 1 has tag ATAT and ApeKI cuts at G/CWGC (where W is A or T in the IUPAC code). I therefore should see something like "ATATCAGC..." at the beginning of my read. What would be the best way to do this with Biopython?

My attempt 1: From the cmd-line, I ask the user to provide the following information "ApeKI=G/CWGC" and then I parse it internally. However, as testing for sequence equality is a difficult matter (it's contexte dependent), Biopython doesn't provide much support for it:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> x = Seq("CWGC", IUPAC.ambiguous_dna)
>>> y = Seq("CAGC", IUPAC.ambiguous_dna)
>>> x == y
...<warning>...
False
>>> str(x) == str(y)
False

My attempt 2: another way is to create a motif:

>>> from Bio import motifs
>>> m = motifs.create([Seq("CAGC"), Seq("CTGC")])
>>> m.degenerate_consensus
Seq('CWGC', IUPACAmbiguousDNA())
>>> "CAGC" in [str(i) for i in m.instances]
True

However, if the user provides me with "ApeKI=G/CWGC", I need to know that W is A or T in the IUPAC code. But it seems that this information is not in Biopython. I can only get a list of letters:

>>> IUPAC.ambiguous_dna.letters
'GATCRYWSMKHBVDN'

Would it be possible to add a dictionary as attribute, so that IUPAC.ambiguous_dna.dict["W"] would return ["A", "T"] for instance?

(I edited the first code block to reflect Peter's comment.)

sequence biopython • 2.3k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by tflutre450

Could you paste the actual code and output? Your imports are inconsistent with the code. Also didn't you get a warning about comparison? Perhaps your Biopython is very old?

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> x = Seq("CWGC", IUPAC.ambiguous_dna)
>>> y = Seq("CAGC", IUPAC.ambiguous_dna)
>>> x == y
/Library/Python/2.7/site-packages/Bio/Seq.py:179: FutureWarning: In future comparing Seq objects will use string comparison (not object comparison). Incompatible alphabets will trigger a warning (not an exception). In the interim please use id(seq1)==id(seq2) or str(seq1)==str(seq2) to make your code explicit and to avoid this warning.
  "and to avoid this warning.", FutureWarning)
False
ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Peter5.6k

@Peter sorry, you're right, I updated my code block

ADD REPLYlink written 4.6 years ago by tflutre450
1
gravatar for Peter
4.6 years ago by
Peter5.6k
Scotland, UK
Peter5.6k wrote:

It isn't clear what you are actually asking, but the final question, how to find out that W is A or T in the IUPAC code, can be done with one of the dictionaries included in Biopython:

>>> from Bio.Data.IUPACData import ambiguous_dna_values
>>> ambiguous_dna_values["W"]
'AT'

P.S. Biopython has an entire module for restriction digest enzymes.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Peter5.6k

@Peter thanks, that's what I was looking for but couldn't find it anywhere in the doc. Thanks also for mentionning Bio.Restriction (http://biopython.org/DIST/docs/cookbook/Restriction.html), I'll definitely look at it.

ADD REPLYlink written 4.6 years ago by tflutre450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 630 users visited in the last hour