Extracting specific gene sequence from genome using perl script
0
0
Entering edit mode
12 months ago

Hi this is in relation to the question i posted recently here.

I have used the following perl script for extracting specific gene sequence in my ref fasta from query fasta (genome). My ref fasta here is a gyrA gene. The query genome is an E coli genome. However, if i get this script correct, I would want to use it to retrieve other genes of interest from a batch of genomes that I have.

#!/usr/bin/perl
use Bio::Perl;
use Bio::SeqIO;
use IO::String;
use Bio::SearchIO;
use Bio::PrimarySeq;
open TF,'gyrA.fasta'
or die "Couldn't open this file: $!"; chomp (@gene_seq=<TF>); my$filename='genome.fasta';

my $gb = Bio::SeqIO->new(-file => "<$filename",
-format => "fasta");
while($seq =$gb->next_seq) {
my @temp_array=grep {$seq->next_seq eq$_}@gene_seq;
#this step is import for correct output
foreach (@temp_array){
#Only one element is given to $string, which is the key point for writing single sequence into one fasta file my$string=$_; #print$string,"\n";
my $stringio = IO::String->new($string);
#If the foreach function is not used here, the defaut $_, actually, here would be many$_ been transfered into $string, and this is resulted in wrong sequences written into one fasta file. my$out = Bio::SeqIO->new(-fh => $stringio, -format => 'fasta'); # output goes into$string
$out->write_seq($seq);
# modify $string #$string =~ s|(>)(\w+)|$1$2|g;
# print into STDOUT
#print $string; #subrountin for writing sequences into separated fasta file write_into_fasta($obj->seq,$string); } } sub write_into_fasta { my$fh=$_[0]; open$fh,">$fh.fasta" or die "Couldn't open this file:$!";
print $fh$_[1],"n";
close \$fh;
}


However this resulted in an error as follows and I do not know where am I going wrong. Apologies, I am a beginner here.

**Can't locate object method "next_seq" via package "Bio::Seq" at ./extractnew_1.pl line 15, <GEN0> line 1.**


Installed bioperl using

sudo apt-get update -y
sudo apt-get install -y bioperl


Thanks.

genome gene sequence • 391 views
0
Entering edit mode

Where did you get the script and what was it meant for? It seems to be doing some kind of string comparison and trying to pull sequences out. This is not going to be very efficient (if it works at all) for pulling out entire gene sequences from whole genome(s).

You need to do this analysis properly using blat/blast followed by parsing out the sequences using those results.