Can Biopython Test For Equality Sequences With Ambiguity Codes
1
0
Entering edit mode
10.2 years ago
tflutre ▴ 580

I have multiplexed data from a RADseq experiment (digestion with ApeKI). Each sample has a tag. For each read, I want to check if the tag occurs next to what remains of the ApeKI motif after the cut. For instance, individual 1 has tag ATAT and ApeKI cuts at G/CWGC (where W is A or T in the IUPAC code). I therefore should see something like "ATATCAGC..." at the beginning of my read. What would be the best way to do this with Biopython?

My attempt 1: From the cmd-line, I ask the user to provide the following information "ApeKI=G/CWGC" and then I parse it internally. However, as testing for sequence equality is a difficult matter (it's contexte dependent), Biopython doesn't provide much support for it:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> x = Seq("CWGC", IUPAC.ambiguous_dna)
>>> y = Seq("CAGC", IUPAC.ambiguous_dna)
>>> x == y
...<warning>...
False
>>> str(x) == str(y)
False

My attempt 2: another way is to create a motif:

>>> from Bio import motifs
>>> m = motifs.create([Seq("CAGC"), Seq("CTGC")])
>>> m.degenerate_consensus
Seq('CWGC', IUPACAmbiguousDNA())
>>> "CAGC" in [str(i) for i in m.instances]
True

However, if the user provides me with "ApeKI=G/CWGC", I need to know that W is A or T in the IUPAC code. But it seems that this information is not in Biopython. I can only get a list of letters:

>>> IUPAC.ambiguous_dna.letters
'GATCRYWSMKHBVDN'

Would it be possible to add a dictionary as attribute, so that IUPAC.ambiguous_dna.dict["W"] would return ["A", "T"] for instance?

(I edited the first code block to reflect Peter's comment.)

sequence biopython • 4.4k views
ADD COMMENT
0
Entering edit mode

Could you paste the actual code and output? Your imports are inconsistent with the code. Also didn't you get a warning about comparison? Perhaps your Biopython is very old?

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> x = Seq("CWGC", IUPAC.ambiguous_dna)
>>> y = Seq("CAGC", IUPAC.ambiguous_dna)
>>> x == y
/Library/Python/2.7/site-packages/Bio/Seq.py:179: FutureWarning: In future comparing Seq objects will use string comparison (not object comparison). Incompatible alphabets will trigger a warning (not an exception). In the interim please use id(seq1)==id(seq2) or str(seq1)==str(seq2) to make your code explicit and to avoid this warning.
  "and to avoid this warning.", FutureWarning)
False
ADD REPLY
0
Entering edit mode

@Peter sorry, you're right, I updated my code block

ADD REPLY
1
Entering edit mode
10.2 years ago
Peter 6.0k

It isn't clear what you are actually asking, but the final question, how to find out that W is A or T in the IUPAC code, can be done with one of the dictionaries included in Biopython:

>>> from Bio.Data.IUPACData import ambiguous_dna_values
>>> ambiguous_dna_values["W"]
'AT'

P.S. Biopython has an entire module for restriction digest enzymes.

ADD COMMENT
0
Entering edit mode

@Peter thanks, that's what I was looking for but couldn't find it anywhere in the doc. Thanks also for mentionning Bio.Restriction (http://biopython.org/DIST/docs/cookbook/Restriction.html), I'll definitely look at it.

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6