Question: Perl::ForkManager does not speed up ATCG calculation
0
gravatar for BioGeek
10 months ago by
BioGeek80
BioGeek80 wrote:

I am new to parallel programming, and today I decided to test the Perl ForkManager module.

I am reading a multifasta infile and calculating ATCG percentage for each sequence. I tried to fork it to 5 different thread to speed up. Unfortunately, it takes more time with ForkManager than normal. What am I doing wrong in following code?

#!/usr/bin/perl
use strict;
use Parallel::ForkManager;
use Bio::SeqIO;

#usage: perl testParallel.pl <multi fasta infile>

my %sequences;
my $seqio = Bio::SeqIO->new(-file => "$ARGV[0]", -format => "fasta");
while(my$seqobj = $seqio->next_seq) {
    my $id  = $seqobj->display_id;    # there's your key
    my $seq = $seqobj->seq;           # and there's your value
    $sequences{$id} = $seq;
}

  my $max_procs = 5;
  my @names = keys %sequences;

  # hash to resolve PID's back to child specific information
  my $pm =  new Parallel::ForkManager($max_procs);

 # Setup a callback for when a child finishes up so we can
  # get it's exit code
  $pm->run_on_finish (
    sub { my ($pid, $exit_code, $ident) = @_;
      #print "** $ident just got out of the pool ".
        "with PID $pid and exit code: $exit_code\n";
    }
  );

  $pm->run_on_start(
    sub { my ($pid,$ident)=@_;
     #print "** $ident started, pid: $pid\n";
    }
  );

  $pm->run_on_wait(
    sub {
      #print "** Have to wait for one children ...\n"
    },
    0.5
  );

  NAMES:
  foreach my $child ( 0 .. $#names ) {
    my $pid = $pm->start($names[$child]) and next NAMES;
    checkATCG($names[$child]);
    $pm->finish($child); # pass an exit code to finish
  }

  print "Waiting for Children...\n";
  $pm->wait_all_children;
  print "Everybody is out of the pool!\n";


sub checkATCG {
my $name=shift;
my $DNA=$sequences{$name};
my $length=length $DNA;
my $a=($DNA=~tr/A//);
my $b=($DNA=~tr/C//);
my $c=($DNA=~tr/G//);
my $d=($DNA=~tr/T//);
my $Total=$a+$b+$c+$d;
my $GC=($DNA=~s/GC/GC/g);
my $AT=($DNA=~s/AT/AT/g);
my $GCper=($GC/($Total)*100);
print"$name\t$Total\t$AT\t$GC\t$GCper:\n";

}
forkmanager parallel speed perl • 442 views
ADD COMMENTlink modified 10 months ago by Ram12k • written 10 months ago by BioGeek80
1

The way that you calculate the percentages is not efficient in check ACGT, using repeated regexes over the entire sequence simply to count number of characters, where it could simply loop through the string to count characters or similar. I can't comment on how the ForkManager works but that to me is a more obvious place to optimize.

ADD REPLYlink written 10 months ago by cmdcolin690

note: I didn't mean to sound so negative and offtopic in this reply as I know the forkmanager was really the topic in question :)

ADD REPLYlink written 10 months ago by cmdcolin690
1
  • Perl::ForkManager is useful when you're actually computing something that takes time, longer than time(checkATCG) [1]
  • checkATCG needs some work

[1] http://stackoverflow.com/questions/28905595/parallel-forkmanager-dbi-faster-than-before-forking-but-still-too-slow#28905942

ADD REPLYlink modified 10 months ago • written 10 months ago by Cornel30

If you are interested in exploring parallel programming, I suggest you look into Go or Java! They make it very easy to do efficiently.

ADD REPLYlink written 10 months ago by Brian Bushnell14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 838 users visited in the last hour