I am currently dealing with some fasta files using fastools to digest the sequence by MboI restriction enzyme. The command I used is as follow fastools -digest -rest MboI -in A.fasta -out REA.fasta This enzyme cuts GATC sites, but the output also contain the complementary strand information which starts with CTAG. Therefore, I wanna ask whether it is possible to let the program only cut GATC instead of cutting CTAG? Or is there any other program to cut the DNA sequence in fasta file based on restriction site? Thanks!
import re,sys from Bio import SeqIO def print_ReSites(id,seqence): pattern=r"GATC" seq_len=len(seqence) sites = [str(m.start()) for m in re.finditer(pattern,seqence)] sites.append(str(seq_len)) for start,end in zip(sites,sites[1:]): print id+"\t"+start+"\t"+end for seq in SeqIO.parse(sys.argv,"fasta"): print_ReSites( str( seq.id),str(seq.seq))
Usage: script.py in.fasta | bedtools getfasta -fi in.fasta -bed - -fo re_out.fasta
in.fasta >1 AGAGGAGGATCGAGGAGGTGATCGAGGATTTTGAGAGGAGGATCGAGGAGGTGATCGAGGATTTTG >2 GAGGGGGCTGGCGGCGGGATCGGAGGGGatttaggaGATCgaggattg re_out.fasta >1:7-19 GATCGAGGAGGT >1:19-40 GATCGAGGATTTTGAGAGGAG >1:40-52 GATCGAGGAGGT >1:52-66 GATCGAGGATTTTG >2:17-36 GATCGGAGGGGatttagga >2:36-48 GATCgaggattg
MboI is a bacterial restriction endonuclease which cleaves double stranded DNA at the palindromic sequence site GATC.
fastools is a computer programm which may be used to identify restriction enzyme cleavage sites in double standed DNA. For your convenience, you feed only the sequence of one strand into this program, and fastools will recognize all cleavage sites on that strand. CTAG is a valid cleavage site for MboI.