Hi,
I'm trying to get upstream and downstream regions of start codons in a genome (20nt upstream and 4nr downstream) but my script gets only the first start codon even if I use the "g" modifier on the regex. How can I get it to read all start codons (ATG)?
use strict;
use warnings;
my @regions;
my $term="ATG";
my $seqsample="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGGATGCATGATTCGG";
while ( $seq =~ m/(\S{20})$term(\S{4})/g ) {
my $xx = $1.$term.$2;
push (@regions, $xx);
}
print "@regions\n"
Here's the script that I wrote.
Thanks in advance.
Hello I am also trying this, but instead of ATG I need to extract region from sequence position suppose in long sequence of 5000 nucl I want to extract +/- region from position
region to extract from
150 to 160 20-20 up/down stream
Kindly help me to extract this region with perl script or awk command.
Thank you