Question

Extracting specific gene sequence from genome using perl script

0

Entering edit mode

4.0 years ago

drdrnaveenkumar • 0

Hi this is in relation to the question i posted recently here.

I have used the following perl script for extracting specific gene sequence in my ref fasta from query fasta (genome). My ref fasta here is a gyrA gene. The query genome is an E coli genome. However, if i get this script correct, I would want to use it to retrieve other genes of interest from a batch of genomes that I have.

#!/usr/bin/perl 
 use Bio::Perl; 
 use Bio::SeqIO;
  use IO::String;
  use Bio::SearchIO;
  use Bio::PrimarySeq;
 open TF,'gyrA.fasta'
 or die "Couldn't open this file: $!";   
 chomp (@gene_seq=<TF>);
 my $filename='genome.fasta';

  my $gb = Bio::SeqIO->new(-file   => "<$filename",
                            -format => "fasta");
 while($seq = $gb->next_seq) { 
    my @temp_array=grep {$seq->next_seq eq $_}@gene_seq;
    #this step is import for correct output
  foreach (@temp_array){
  #Only one element is given to $string, which is the key point for writing single sequence into one fasta file
     my $string=$_;
     #print $string,"\n";
     my $stringio = IO::String->new($string);
     #If the foreach function is not used here, the defaut $_, actually, here would be many $_ been transfered into $string, and this is resulted in wrong sequences written into one fasta file.
       my $out = Bio::SeqIO->new(-fh => $stringio,
                                 -format => 'fasta');
       # output goes into $string
       $out->write_seq($seq);
       # modify $string
       #$string =~ s|(>)(\w+)|$1$2|g;
       # print into STDOUT
       #print $string;  
       #subrountin for writing sequences into separated fasta file
       write_into_fasta($obj->seq,$string);
                  }
    }

   sub write_into_fasta {
     my $fh=$_[0];
     open $fh,">$fh.fasta" or die "Couldn't open this file: $!";
     print $fh $_[1],"n";
    close $fh;
 }

However this resulted in an error as follows and I do not know where am I going wrong. Apologies, I am a beginner here.

**Can't locate object method "next_seq" via package "Bio::Seq" at ./extractnew_1.pl line 15, <GEN0> line 1.**

Installed bioperl using

sudo apt-get update -y 
sudo apt-get install -y bioperl

Thanks.

genome gene sequence • 983 views

ADD COMMENT • link updated 4.0 years ago by Ram 43k • written 4.0 years ago by drdrnaveenkumar • 0

0

Entering edit mode

Where did you get the script and what was it meant for? It seems to be doing some kind of string comparison and trying to pull sequences out. This is not going to be very efficient (if it works at all) for pulling out entire gene sequences from whole genome(s).

You need to do this analysis properly using blat/blast followed by parsing out the sequences using those results.

ADD REPLY • link 4.0 years ago by GenoMax 141k