how to find the total number of recognition sequence of a restriction enzyme in total human genome ?

how to find the total number of recognition sequence of a restriction enzyme in total human genome ?

0

Entering edit mode

8.3 years ago

reihaneh • 0

how to find the total number of recognition sequence of a restriction enzyme in total human genome ?

genome gene • 4.6k views

ADD COMMENT • link updated 8.3 years ago by Erik Wright ▴ 420 • written 8.3 years ago by reihaneh • 0

0

Entering edit mode

duplicate of Genomic Restriction Finder

ADD REPLY • link 8.3 years ago by Pierre Lindenbaum 161k

3

Entering edit mode

8.3 years ago

Erik Wright ▴ 420

You could accomplish this in R using the DECIPHER package. An example of using EcoRI to digest chromosome 1:

library(DECIPHER)
data(RESTRICTION_ENZYMES) # all restriction enzymes sold by NEB
dna <- readDNAStringSet("ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_GRCh38.p2_chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["EcoRI"], dna, type="positions", strand="top")
length(unlist(cuts)) # 68543

Simply loop through the other chromosomes and sum the counts. Hope that helps!

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Erik Wright ▴ 420

0

Entering edit mode

Hi Erik,

Thanks a lot for this nice tool! I had a question related to this. I want to find cut sites in a human genome for 2 restriction enzymes whose names I do not know (they are not disclosed by the company), but I do know their cut sites, which are GATC^ and GANT^C (here N stands for any nucleotide and ^ is the cut site of the enzyme). Do you think your tool can accomplish this? Thanks again!

ADD REPLY • link 5.9 years ago by c_u ▴ 520

0

Entering edit mode

A little late to the party, however yes, you can accomplish that with a simple addition to the above lines of code. So, after you load the RESTRICTION_ENZYMES with data(RESTRICTION_ENZYMES) add execute the following line:

RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"

Then you will also have modify the DigestDNA() command:

cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")

And you are good to go. Replace the "Give_a_code_name_to_your_unknown_enzyme" with any string you want.

So, to summarize, you will have to do:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
# The link to get the chromosomal sequences is outdated, so I'll use one that is as of Sep 7th 2020 valid:
dna <- readDNAStringSet("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
length(unlist(cuts))

Now, if you are not familiar with loops or the R syntax, to get the total number of fragments expected to be generated with digestion with the specific restriction enzyme, you can modify the above block of code as:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments

And unless there's a typo, this should give you the number of fragments using a given enzyme. Note that with i iterating through the chr_list above, we are not fetching the unlocalized (chrUn_N) and unplaced sequences (chrN_random).

Last, if you want to digest with multiple restriction enzymes:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme1"]]<-"GANT/C"
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme2"]]<-"GATC/"
multiple_enzymes <- RESTRICTION_ENZYMES[c("Give_a_code_name_to_your_unknown_enzyme1","Give_a_code_name_to_your_unknown_enzyme2")]
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(multiple_enzymes, dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments

ADD REPLY • link 3.6 years ago by jo.urailidis ▴ 10

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 1961 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6