Question: Calculate fragment length from Bio.Restriction Analysis Object
0
gravatar for Chanz R
5 months ago by
Chanz R0
Chanz R0 wrote:

New to python!

My goal is to in silico digest several genomes with two restriction enzymes and calculate the fragment lengths.

With Bio.Restriction I managed to use the Analysis function

>>> for seq_record in SeqIO.parse("A1/A1.fasta", "fasta"):
       Ana = Analysis(rb,seq_record.seq, linear=True)
       printseq_record.id)
       Ana.print_that()

to produce something like this:

A1_scaffold0149
MseI       :  11, 110, 238, 269, 281, 423, 544, 607, 699, 723, 769, 949, 1461,
              1500, 1511, 1626, 1650, 1829, 1894, 2034, 2497, 2545, 2638,
              2861, 3186, 3192, 3261, 3483, 3641, 3798, 3950, 3972, 3994.

EcoRI      :  1563, 2004.

A1_scaffold0150
MseI       :  41, 77, 98, 103, 157, 408, 434, 476, 491, 497, 648, 706, 805,
              816, 910, 931*****, 1014*****, 1018, 1044, 1084, 1098, 1129, 1354, 1420,
              1483, 1495, 1516, 1589.
EcoRI      :  943*****.

What I really care about is only the fragment lengths, not the respective positions or which enzyme. In other words (for example, A1_scaffold0150) I would like to get a list of all fragment lengths hypothetically produced from a complete double digest reaction (where the positions of two enzymes are merged, ordered and subtracted). ex:

41.................41

77-41............36

....

943-931.......12

1014-943.....71

....etc.

to file.write() in this manner (without*, this is only to draw attention to merging of sites):

41 
36
21
5
54
251
26
42
15
6
151
58
99
11
94
21
12*****
71*****
4
26
40
14
31
225
66
63
12
21
73

To do this, I need to access the positions in something I can work with, but type() is either a Bio.Restriction 'class' object or after I try to coerce it to something for arithmetic, it returns 'None' or NoneType'.

What I need to do (unless there is a fancy feature that does this and I have not found it yet) is:

1) For each scaffold in genome.fasta (160 Mb):
2)     do the restriction digest to find positions for both enzymes
3)     access positions from Analysis (type 'class'), merge and order for arithmetic
4)     keep position 0 as first length, compute all fragment lengths (1-0), (2-1) etc.

Sorry if this has been answered elsewhere, I could not find it :-( Please direct me if it has.

I would like to do it with python as our cluster version of EMBOSS is missing REBASE files, so I tried, but I can't really do anything until the administrators configure the files.

Kind regards and thanks for any suggestions!

ADD COMMENTlink modified 5 months ago • written 5 months ago by Chanz R0

Do you specifically want to code up something bespoke? Something like this tool, might already achieve what you need in some form or other.

ADD REPLYlink written 5 months ago by jrj.healey13k

I would like to stick to what is available on our cluster (I checked and no bandwagon).

So I thought coding something in python would be the fastest way.

ADD REPLYlink written 5 months ago by Chanz R0
0
gravatar for Chanz R
5 months ago by
Chanz R0
Chanz R0 wrote:

Ok, I think I figured out an answer:

for sequence in SeqIO.parse("A1/A1.fasta", "fasta"):
    Ana = Analysis(rd,sequence.seq,linear=True)
    ana = str(Ana.format_output())
    nums = ana.replace("\n","")
    nums = nums.replace(" ","")
    nums = nums.replace("MseI:", "")
    nums = nums.replace("EcoRI:", "")
    nums = nums.replace("Enzymeswhichdonotcutthesequence.EcoRI", "")
    nums = nums.replace(".",",")
    sites = nums.split(',')
    del sites[-1]
    n0 = 0
    n1 = 1
    for i in range(len(sites)):
            sites[i] = int(sites[i])
    sites.sort()
    print(sites[n0])
    while n1 < len(sites):
            print(sites[n1]-sites[n0])
            n0 += 1
            n1 += 1

Still, I would appreciate any other answers or critiques from the more experienced.

Cheers!

ADD COMMENTlink written 5 months ago by Chanz R0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 899 users visited in the last hour