Question: Insillico dual restriction enzyme reference genome digestion.
gravatar for William
22 months ago by
William4.2k wrote:

I am looking for a tool that can do a insillico digestion of a reference genome given 2 restriction enzymes (or the pattern at which both cut).

The output that I would like to have is a BED file with the start and end position on the reference genome of all the produced RE DNA fragments. (or only those that are larger than a specific lenght).

The tool should take into account that both the forward and reverse DNA strand could be cut.

Is there any such tool that I can download?

insillico ddradseq • 624 views
ADD COMMENTlink modified 22 months ago by dariober9.1k • written 22 months ago by William4.2k

You can try restrict from EMBOSS. You would need to make the BED format file from the output of restrict.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax48k

duplicate of Genomic Restriction Finder

ADD REPLYlink written 22 months ago by Pierre Lindenbaum107k
gravatar for William
22 months ago by
William4.2k wrote:

I found the R package SimRad which

provides a number functions to simulate restriction enzyme digestion, library construction and fragments size selection.

A nice feature is that a filter step can also be done on restriction site combination to only select fragments that start with enzyme A restriction site and end with enzyme B restriction site., type = "AB+BA", cut_site_5prime1, cut_site_3prime1,
cut_site_5prime2, cut_site_3prime2)

The output is the DNA fragments produced by the digestion and size selection.

The output does not contain were the fragments are located on the genome.

To create the BED file I need to BLAST the fragments versus the reference genome I guess.

ADD COMMENTlink written 22 months ago by William4.2k

In this case, Blast will be complete overkill. All your fragments are EXACT substrings of your ref sequence. Thus you can find their location with any scripting language which offers a string.find() method.

ADD REPLYlink written 22 months ago by piet1.4k
gravatar for dariober
22 months ago by
Glasgow - UK
dariober9.1k wrote:

This is a hack, but it might work. I wrote a script to find regular expressions in fasta files here fastaRegexFinder.

To find all the restriction fragments from two enzymes, run for each combination of enzyme 1 and enzyme 2 in forward and reverse (= 16 times). E.g. say restriction sites are TTCC and ACTG, do:

for r1 in $res
    for r2 in $res
    echo "$r1 and $r2" -q --noreverse -f test_data/synth.fa -r $r1.*?(?=$r2)

This compiles and runs the following commands: -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=TTCC) -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=GGAA) -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=ACTG) -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=CAGT) -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=TTCC) -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=GGAA) -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=ACTG) -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=CAGT) -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=TTCC) -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=GGAA) -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=ACTG) -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=CAGT) -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=TTCC) -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=GGAA) -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=ACTG) -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=CAGT)

The output (to stdout by default) will be in bed format.

ADD COMMENTlink written 22 months ago by dariober9.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 627 users visited in the last hour