Question

Calculate fragment length from Bio.Restriction Analysis Object

0

Entering edit mode

5.2 years ago

Chanz R • 0

New to python!

My goal is to in silico digest several genomes with two restriction enzymes and calculate the fragment lengths.

With Bio.Restriction I managed to use the Analysis function

>>> for seq_record in SeqIO.parse("A1/A1.fasta", "fasta"):
       Ana = Analysis(rb,seq_record.seq, linear=True)
       printseq_record.id)
       Ana.print_that()

to produce something like this:

A1_scaffold0149
MseI       :  11, 110, 238, 269, 281, 423, 544, 607, 699, 723, 769, 949, 1461,
              1500, 1511, 1626, 1650, 1829, 1894, 2034, 2497, 2545, 2638,
              2861, 3186, 3192, 3261, 3483, 3641, 3798, 3950, 3972, 3994.

EcoRI      :  1563, 2004.

A1_scaffold0150
MseI       :  41, 77, 98, 103, 157, 408, 434, 476, 491, 497, 648, 706, 805,
              816, 910, 931*****, 1014*****, 1018, 1044, 1084, 1098, 1129, 1354, 1420,
              1483, 1495, 1516, 1589.
EcoRI      :  943*****.

What I really care about is only the fragment lengths, not the respective positions or which enzyme. In other words (for example, A1_scaffold0150) I would like to get a list of all fragment lengths hypothetically produced from a complete double digest reaction (where the positions of two enzymes are merged, ordered and subtracted). ex:

41.................41

77-41............36

....

943-931.......12

1014-943.....71

....etc.

to file.write() in this manner (without*, this is only to draw attention to merging of sites):

To do this, I need to access the positions in something I can work with, but type() is either a Bio.Restriction 'class' object or after I try to coerce it to something for arithmetic, it returns 'None' or NoneType'.

What I need to do (unless there is a fancy feature that does this and I have not found it yet) is:

1) For each scaffold in genome.fasta (160 Mb):
2)     do the restriction digest to find positions for both enzymes
3)     access positions from Analysis (type 'class'), merge and order for arithmetic
4)     keep position 0 as first length, compute all fragment lengths (1-0), (2-1) etc.

Sorry if this has been answered elsewhere, I could not find it :-( Please direct me if it has.

I would like to do it with python as our cluster version of EMBOSS is missing REBASE files, so I tried, but I can't really do anything until the administrators configure the files.

Kind regards and thanks for any suggestions!

Biopython ddRADseq in silico digest • 1.7k views

ADD COMMENT • link 5.2 years ago by Chanz R • 0

0

Entering edit mode

Do you specifically want to code up something bespoke? Something like this tool, might already achieve what you need in some form or other.

ADD REPLY • link 5.2 years ago by Joe 21k

0

Entering edit mode

I would like to stick to what is available on our cluster (I checked and no bandwagon).

So I thought coding something in python would be the fastest way.

ADD REPLY • link 5.2 years ago by Chanz R • 0

score 0 · Answer 1 · 2019-02-19

Ok, I think I figured out an answer:

for sequence in SeqIO.parse("A1/A1.fasta", "fasta"):
    Ana = Analysis(rd,sequence.seq,linear=True)
    ana = str(Ana.format_output())
    nums = ana.replace("\n","")
    nums = nums.replace(" ","")
    nums = nums.replace("MseI:", "")
    nums = nums.replace("EcoRI:", "")
    nums = nums.replace("Enzymeswhichdonotcutthesequence.EcoRI", "")
    nums = nums.replace(".",",")
    sites = nums.split(',')
    del sites[-1]
    n0 = 0
    n1 = 1
    for i in range(len(sites)):
            sites[i] = int(sites[i])
    sites.sort()
    print(sites[n0])
    while n1 < len(sites):
            print(sites[n1]-sites[n0])
            n0 += 1
            n1 += 1

Still, I would appreciate any other answers or critiques from the more experienced.

Cheers!