Question: What's the best way to find all the possible cutting sites of an sequence-specific enzyme?
0
gravatar for zhuxun2
4.6 years ago by
zhuxun220
United States
zhuxun220 wrote:

I have an sequence-specific enzyme for cutting the DNA. The enzyme recognize a fixed 6bps sequence (e.g. ACTAGT) and cut both strands of the DNA at a specific location (e.g. A|CTAGT).

I was wondering if there is a way I can find out all the locations of possible cutting sites in the genome. That is, I'm looking for the location of all the sequence "ACTAGT" (perfect match only) along the entire genome.

I have the FASTA files from each of the chromosome (chr1.fa, chr2.fa, etc.), from hg19 database.

I considered using Bowtie2, with "-a" option, but after reading the manual I think the program was not designed for this purpose and they warned that "it could be extremely slow". I was thinking if there is a (possibly lightweight) program that was designed specifically for this.

Thank you.

 

ADD COMMENTlink modified 4.6 years ago by Gary480 • written 4.6 years ago by zhuxun220
1

A RegEx in Perl/Python should work fine, BioPerl/BioPython included functions for this.

Something like this:

 

#!/usr/bin/perl
# restriction.pl
my $site = "ACTAGT"; # This is palindromic, no rev-comp sequence needed

$/ = "\n>"; # Fasta slurp-mode
while (<>) {
    s/>//g;
    my ($id, @seq) = split (/\n/, $_);
    my $seq = join "", @seq;
    while ($seq =~ m/$site/g) {
        print "$id -> $-[0]\n";
    }
}

and run: perl restriction.pl < FASTA > SITES

ADD REPLYlink written 4.6 years ago by JC9.5k

In case this is a homework question, for extra marks, you might want to consider how polymorphisms could affect the result.

ADD REPLYlink written 4.6 years ago by rbagnall1.5k
1

Even more embarrassingly, it is not. :D

By polymorphism you mean SNPs on the genome? Or on the enzyme?

 

ADD REPLYlink written 4.6 years ago by zhuxun220

Don't forget to ask how do position and GC-content mutation frequency affects the sites too.

ADD REPLYlink written 4.6 years ago by JC9.5k
1
gravatar for Neilfws
4.6 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

EMBOSS. Lots and lots of useful, small command-line utilities including several for nucleic acid restriction.

ADD COMMENTlink written 4.6 years ago by Neilfws48k

Re-reading the question, a string/regex search as suggested by JC is a good solution. However, the EMBOSS tools certainly fit the description "a (possibly lightweight) program that was designed specifically for this."

ADD REPLYlink written 4.6 years ago by Neilfws48k

You right, you can use Emboss::restrict to get the RE mapping, but could be faster a little script rather than install Emboss ;)

ADD REPLYlink written 4.6 years ago by JC9.5k
1
gravatar for Gary
4.6 years ago by
Gary480
Taiwan/Taichung/China Medical University Hospital
Gary480 wrote:

Hi,

A|CTAGT is the SpeI cutting site. You can use UCSC Genome Browser to show SpeI cutting on the whole genome level as the fig1.  On the UCSC Genome Browser, please press Restr_Enzymes track and key in SpeI under the Filter display by enzyme as the fig2.

In addition, the free version of SnapGene also can do the same thing (http://www.snapgene.com/). SnapGene is user-friendly and its results are very beautiful, since it is commercial software.

fig1

fig2

ADD COMMENTlink written 4.6 years ago by Gary480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour