Question: Could someone explain this perl command?
0
gravatar for Seq225
11 months ago by
Seq22590
Seq22590 wrote:

Hi,

I have a bash script and it looks like this:

*#!/bin/bash
for i in *dat.gz
do gunzip $i
echo uniprot_sprot_archaea.dat | perl -slane '$a=(split /\_/, $_)[2]; $a=~/(\w+).dat/; $b=$1; print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i
done*

I don't know coding. But I need to understand this perl commands. From echo to end of the command, I don't understand. Could someone please explain them?

Thanks a ton, and sorry for these silly request.

bash perl • 433 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Seq22590

I have some doubts that this is working as intended. What is it you're trying to do ?
For instance, while the bash script will unzip all dat.gz files, the perl line will repeatedly work on the string uniprot_sprot_archaea.dat. The split part extract the string archae and so the perl one-liner will print the following every time the bash script unzips a file:
perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta
Note the presence of $i in the output, this is because the \ preceding $i, tells perl to not interpret the $ sign as indicating a variable.

ADD REPLYlink modified 11 months ago • written 11 months ago by Jean-Karim Heriche20k

I am very confused here. I think the main execution here is based on the perl script. I have provided it below.

The entire idea is to extract sequences with “Complete Proteome” in the Keyword from files downloaded (Swiss-Prot and TrEMBL). All I am trying to do is repeating some analyses from this paper. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002266#sec014 (method section HGT analyses)

ADD REPLYlink written 11 months ago by Seq22590

The script screen_complete_proteome_from_uniprot_division.pl is never executed when you run the bash script you posted. If you want to execute it from within the perl one-liner, one option is to use the qx operator, i.e. replace print by qx, but that's not the only problem you have.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche20k

Thanks very much. I will replace the print with qx. Also, if you don't mind and have time to spend, is it possible to point out the other problems?

Thanks like a ocean!!

ADD REPLYlink written 11 months ago by Seq22590

For every file that is unzipped, the bash script passes the string 'uniprot_sprot_archaea.dat' to perl, i.e. it's always printing the line: perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta

Maybe you want run the script screen_complete_proteome_from_uniprot_division.pl on each unzipped file ? Then try something along these lines:

 #!/bin/bash
 for i in *dat.gz
 do gunzip $i
 echo $i | perl -slane '$_=~s/\.gz//; # remove the .gz extension from the filename
                        $a=(split /\_/, $_)[2]; # split on _ and extract the third part
                        $a=~/(\w+)\.dat/; $b=$1; # extract all characters before .dat
                         qx(perl screen_complete_proteome_from_uniprot_division.pl $_ > uniprot_$b.fasta)' # execute perl script on unzipped input file and save output in .fasta file
done
ADD REPLYlink modified 11 months ago • written 11 months ago by Jean-Karim Heriche20k

Great. Thank you! I will try these...

ADD REPLYlink written 11 months ago by Seq22590

I'd recommend redirecting stdout of the bash script to a file and executing that file. Running the perl script from a loop will make debugging more difficult.

ADD REPLYlink written 11 months ago by RamRS22k
4
gravatar for RamRS
11 months ago by
RamRS22k
Houston, TX
RamRS22k wrote:

split/|_/, $_[2]

Split the current line by underscore, pick 3rd value

$a=~/(\w+).dat/; $b=$1

Pick all the alphanumeric characters preceding .dat, assign that to $b

print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i

Do stuff with the variables obtained above.

perl -slane

Run perldoc perlrun from the command line and read the manual. That is the explanation for each of the -s, -l, -a, -n and -e options

  • -s enables the -- -i=$i variable passing part.
  • -e is "execute this perl stuff that I'm passing as a string", like bash -c or Rscript -e. Instead of processing a file, this makes the command process a command line argument.

    perl -nle is essentially like awk, running the command-passed-as-an-argument per line of input file.

I'm not sure what the significance of the -a is here.

ADD COMMENTlink modified 11 months ago • written 11 months ago by RamRS22k

Great!! Thank you very much Ram. I am running the script, however, it is not giving me what I want. Not sure if something is wrong with the screen_complete_proteome_from_uniprot_division.pl script

Here is what it looks like:

*

#!/usr/bin/env perl
use strict;
use warnings;
use G;
# perl screen_complete_proteome_from_uniprot_division.pl EBML_format.dat
## EMBL_format.dat ex : uniprot_sprot_archaea.dat
my $input = shift;
my %out = &get_fasta($input);
sub get_fasta{
  my $input = $_[0];
  my $tree = readFile($input, -format=>"swiss" );
  my ($dat, $div) = (split /\_/, $input)[1,2];
  $div =~ s/.dat//;
  foreach my $entry ( sort keys %{$tree} ) {
    if( defined $tree->{$entry}->{KW} && $tree->{$entry}->{KW} =~ /Complete\sproteome/ ) {
      next if $tree->{$entry}->{OC} =~ /Tardigrada/ ;
#      next if $tree->{$entry}->{OC} =~ /Nematoda/ ;
#      next if $tree->{$entry}->{OC} =~ /Arthropoda/ ;
      my %fasta;
      my $seq = $tree->{$entry}->{"  "};
      $seq =~ s/\s+//g;
      say $tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div if $tree->{$entry}->{OC} =~ /Metazoa/;
      $fasta{$tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div} = $seq;
      say to_fasta(%fasta);
    }
  }
}

*

Would you be able to help me figuring it out?

I appreciate your input very very much!

ADD REPLYlink modified 11 months ago • written 11 months ago by Seq22590

See my comment above.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche20k

I'm sorry, I'm not in a place to debug perl code - I've been out of touch with Perl for a while now, and Perl is a difficult-to-debug language to begin with.

ADD REPLYlink written 11 months ago by RamRS22k

Its ok. Thanks very much Ram!!

ADD REPLYlink written 11 months ago by Seq22590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 576 users visited in the last hour