Calculate fragment length from Bio.Restriction Analysis Object
1
0
Entering edit mode
5.2 years ago
Chanz R • 0

New to python!

My goal is to in silico digest several genomes with two restriction enzymes and calculate the fragment lengths.

With Bio.Restriction I managed to use the Analysis function

>>> for seq_record in SeqIO.parse("A1/A1.fasta", "fasta"):
       Ana = Analysis(rb,seq_record.seq, linear=True)
       printseq_record.id)
       Ana.print_that()

to produce something like this:

A1_scaffold0149
MseI       :  11, 110, 238, 269, 281, 423, 544, 607, 699, 723, 769, 949, 1461,
              1500, 1511, 1626, 1650, 1829, 1894, 2034, 2497, 2545, 2638,
              2861, 3186, 3192, 3261, 3483, 3641, 3798, 3950, 3972, 3994.

EcoRI      :  1563, 2004.

A1_scaffold0150
MseI       :  41, 77, 98, 103, 157, 408, 434, 476, 491, 497, 648, 706, 805,
              816, 910, 931*****, 1014*****, 1018, 1044, 1084, 1098, 1129, 1354, 1420,
              1483, 1495, 1516, 1589.
EcoRI      :  943*****.

What I really care about is only the fragment lengths, not the respective positions or which enzyme. In other words (for example, A1_scaffold0150) I would like to get a list of all fragment lengths hypothetically produced from a complete double digest reaction (where the positions of two enzymes are merged, ordered and subtracted). ex:

41.................41

77-41............36

....

943-931.......12

1014-943.....71

....etc.

to file.write() in this manner (without*, this is only to draw attention to merging of sites):

41 
36
21
5
54
251
26
42
15
6
151
58
99
11
94
21
12*****
71*****
4
26
40
14
31
225
66
63
12
21
73

To do this, I need to access the positions in something I can work with, but type() is either a Bio.Restriction 'class' object or after I try to coerce it to something for arithmetic, it returns 'None' or NoneType'.

What I need to do (unless there is a fancy feature that does this and I have not found it yet) is:

1) For each scaffold in genome.fasta (160 Mb):
2)     do the restriction digest to find positions for both enzymes
3)     access positions from Analysis (type 'class'), merge and order for arithmetic
4)     keep position 0 as first length, compute all fragment lengths (1-0), (2-1) etc.

Sorry if this has been answered elsewhere, I could not find it :-( Please direct me if it has.

I would like to do it with python as our cluster version of EMBOSS is missing REBASE files, so I tried, but I can't really do anything until the administrators configure the files.

Kind regards and thanks for any suggestions!

Biopython ddRADseq in silico digest • 1.7k views
ADD COMMENT
0
Entering edit mode

Do you specifically want to code up something bespoke? Something like this tool, might already achieve what you need in some form or other.

ADD REPLY
0
Entering edit mode

I would like to stick to what is available on our cluster (I checked and no bandwagon).

So I thought coding something in python would be the fastest way.

ADD REPLY
0
Entering edit mode
5.2 years ago
Chanz R • 0

Ok, I think I figured out an answer:

for sequence in SeqIO.parse("A1/A1.fasta", "fasta"):
    Ana = Analysis(rd,sequence.seq,linear=True)
    ana = str(Ana.format_output())
    nums = ana.replace("\n","")
    nums = nums.replace(" ","")
    nums = nums.replace("MseI:", "")
    nums = nums.replace("EcoRI:", "")
    nums = nums.replace("Enzymeswhichdonotcutthesequence.EcoRI", "")
    nums = nums.replace(".",",")
    sites = nums.split(',')
    del sites[-1]
    n0 = 0
    n1 = 1
    for i in range(len(sites)):
            sites[i] = int(sites[i])
    sites.sort()
    print(sites[n0])
    while n1 < len(sites):
            print(sites[n1]-sites[n0])
            n0 += 1
            n1 += 1

Still, I would appreciate any other answers or critiques from the more experienced.

Cheers!

ADD COMMENT

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6