Question: Double Digest through Regular Expression in python
0
gravatar for anasjamshed1994
6 weeks ago by
anasjamshed199460 wrote:

I have the following DNA sequence in dna.txt file :

ATGGCAATAACCCCCCGTTTCTACTTCTAGAGGAGAAAAGTATTGACATGAGCGCTCCCGGCACAAGGGCCAAAGAAGTCTCCAATTTCTTATTTCCGAATGACATGCGTCTCCTTGCGGGTAAATCACCGACCGCAATTCATAGAAGCCTGGGGGAACAGATAGGTCTAATTAGCTTAAGAGAGTAAATCCTGGGATCATTCAGTAGTAACCATAAACTTACGCTGGGGCTTCTTCGGCGGATTTTTACAGTTACCAACCAGGAGATTTGAAGTAAATCAGTTGAGGATTTAGCCGCGCTATCCGGTAATCTCCAAATTAAAACATACCGTTCCATGAAGGCTAGAATTACTTACCGGCCTTTTCCATGCCTGCGCTATACCCCCCCACTCTCCCGCTTATCCGTCCGAGCGGAGGCAGTGCGATCCTCCGTTAAGATATTCTTACGTGTGACGTAGCTATGTATTTTGCAGAGCTGGCGAACGCGTTGAACACTTCACAGATGGTAGGGATTCGGGTAAAGGGCGTATAATTGGGGACTAACATAGGCGTAGACTACGATGGCGCCAACTCAATCGCAGCTCGAGCGCCCTGAATAACGTACTCATCTCAACTCATTCTCGGCAATCTACCGAGCGACTCGATTATCAACGGCTGTCTAGCAGTTCTAATCTTTTGCCAGCATCGTAATAGCCTCCAAGAGATTGATGATAGCTATCGGCACAGAACTGAGACGGCGCCGATGGATAGCGGACTTTCGGTCAACCACAATTCCCCACGGGACAGGTCCTGCGGTGCGCATCACTCTGAATGTACAAGCAACCCAAGTGGGCCGAGCCTGGACTCAGCTGGTTCCTGCGTGAGCTCGAGACTCGGGATGACAGCTCTTTAAACATAGAGCGGGGGCGTCGAACGGTCGAGAAAGTCATAGTACCTCGGGTACCAACTTACTCAGGTTATTGCTTGAAGCTGTACTATTTTAGGGGGGGAGCGCTGAAGGTCTCTTCTTCTCATGACTGAACTCGCGAGGGTCGTGAAGTCGGTTCCTTCAATGGTTAAAAAACAAAGGCTTACTGTGCGCAGAGGAACGCCCATCTAGCGGCTGGCGTCTTGAATGCTCGGTCCCCTTTGTCATTCCGGATTAATCCATTTCCCTCATTCACGAGCTTGCGAAGTCTACATTGGTATATGAATGCGACCTAGAAGAGGGCGCTTAAAATTGGCAGTGGTTGATGCTCTAAACTCCATTTGGTTTACTCGTGCATCACCGCGATAGGCTGACAAAGGTTTAACATTGAATAGCAAGGCACTTCCGGTCTCAATGAACGGCCGGGAAAGGTACGCGCGCGGTATGGGAGGATCAAGGGGCCAATAGAGAGGCTCCTCTCTCACTCGCTAGGAGGCAAATGTAAAACAATGGTTACTGCATCGATACATAAAACATGTCCATCGGTTGCCCAAAGTGTTAAGTGTCTATCACCCCTAGGGCCGTTTCCCGCATATAAACGCCAGGTTGTATCCGCATTTGATGCTACCGTGGATGAGTCTGCGTCGAGCGCGCCGCACGAATGTTGCAATGTATTGCATGAGTAGGGTTGACTAAGAGCCGTTAGATGCGTCGCTGTACTAATAGTTGTCGACAGACCGTCGAGATTAGAAAATGGTACCAGCATTTTCGGAGGTTCTCTAACTAGTATGGATTGCGGTGTCTTCACTGTGCTGCGGCTACCCATCGCCTGAAATCCAGCTGGTGTCAAGCCATCCCCTCTCCGGGACGCCGCATGTAGTGAAACATATACGTTGCACGGGTTCACCGCGGTCCGTTCTGAGTCGACCAAGGACACAATCGAGCTCCGATCCGTACCCTCGACAAACTTGTACCCGACCCCCGGAGCTTGCCAGCTCCTCGGGTATCATGGAGCCTGTGGTTCATCGCGTCCGATATCAAACTTCGTCATGATAAAGTCCCCCCCTCGGGAGTACCAGAGAAGATGACTACTGAGTTGTGCGAT

I want to read the DNA sequence from the file dna.txt and then predict the lengths of the fragments that we will get by digesting the sequence with the (made-up) restriction enzymes

– AbcI: cutting site "ANT*AAT"

– AbcII: cutting site "GCRW*TG"

asterisks indicate where the enzyme cuts the DNA

Can anyone solve my query?

ADD COMMENTlink modified 9 days ago by Dunois490 • written 6 weeks ago by anasjamshed199460

Did you try anything? Biostars is generally not a code-writing service.

ADD REPLYlink written 6 weeks ago by ATpoint44k
import re

# open input file
infile = open("dna.txt")
line = infile.read()
# split line by "," into list of strings
sequence = line.strip().split(",")

print(sequence)

after that, I am unable to do?

ADD REPLYlink modified 6 weeks ago by _r_am32k • written 6 weeks ago by anasjamshed199460

Did you found this question (SO) https://stackoverflow.com/questions/43365742/cut-string-within-a-specific-pattern-in-python ? It has at least 3 answers which I think should be useful to you.

Only one side note: since your made-up cut-sites has ambiguous bases, you need to handle those (or use some library which does that for you).

Good luck

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by massa.kassa.sc3na340

It is something diffrent

ADD REPLYlink written 6 weeks ago by anasjamshed199460

Like the other user said, those answers should be useful, not applicable as-is. Try adapting them and come back to us if you have difficulties.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by _r_am32k
0
gravatar for Dunois
9 days ago by
Dunois490
Dunois490 wrote:

Here's something to get you started:

import re
def digdigest(digseq, cutsite_orig):
  cutsite_l, cutsite_r = re.split("\*", cutsite_orig)

  #Identify cutting positions with a unique character
  cutsite = re.sub("\*", "", cutsite_orig)
  cutsite

  #If there are Ns in the cutsite, replace this with the . regex placeholder
  cutsite = re.sub("N", ".", cutsite)
  cutsite_l = re.sub("N", ".", cutsite_l)
  cutsite_r = re.sub("N", ".", cutsite_r)

  #cutsite_l+"__"+cutsite_r
  digseq_mod = re.sub(r"("+cutsite_l+")"+"("+cutsite_r+")", r"\1__\2", digseq)



  print(digseq_mod.split("__"))
  #return(digseq_mod.split("__"))

Provide digdigest() the sequence and cutting site as you have indicated in the OP, and you'll get something like this:

digdigest("ATATATATAGTAATGTGTGCATTAATATGC", "ANT*AAT")
#['ATATATATAGT', 'AATGTGTGCATT', 'AATATGC']
ADD COMMENTlink modified 9 days ago • written 9 days ago by Dunois490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour
_