Could someone explain this perl command?
1
0
Entering edit mode
5.7 years ago
Seq225 ▴ 110

Hi,

I have a bash script and it looks like this:

*#!/bin/bash
for i in *dat.gz
do gunzip $i
echo uniprot_sprot_archaea.dat | perl -slane '$a=(split /\_/, $_)[2]; $a=~/(\w+).dat/; $b=$1; print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i
done*

I don't know coding. But I need to understand this perl commands. From echo to end of the command, I don't understand. Could someone please explain them?

Thanks a ton, and sorry for these silly request.

bash perl • 1.5k views
ADD COMMENT
0
Entering edit mode

I have some doubts that this is working as intended. What is it you're trying to do ?
For instance, while the bash script will unzip all dat.gz files, the perl line will repeatedly work on the string uniprot_sprot_archaea.dat. The split part extract the string archae and so the perl one-liner will print the following every time the bash script unzips a file:
perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta
Note the presence of $i in the output, this is because the \ preceding $i, tells perl to not interpret the $ sign as indicating a variable.

ADD REPLY
0
Entering edit mode

I am very confused here. I think the main execution here is based on the perl script. I have provided it below.

The entire idea is to extract sequences with “Complete Proteome” in the Keyword from files downloaded (Swiss-Prot and TrEMBL). All I am trying to do is repeating some analyses from this paper. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002266#sec014 (method section HGT analyses)

ADD REPLY
0
Entering edit mode

The script screen_complete_proteome_from_uniprot_division.pl is never executed when you run the bash script you posted. If you want to execute it from within the perl one-liner, one option is to use the qx operator, i.e. replace print by qx, but that's not the only problem you have.

ADD REPLY
0
Entering edit mode

Thanks very much. I will replace the print with qx. Also, if you don't mind and have time to spend, is it possible to point out the other problems?

Thanks like a ocean!!

ADD REPLY
0
Entering edit mode

For every file that is unzipped, the bash script passes the string 'uniprot_sprot_archaea.dat' to perl, i.e. it's always printing the line: perl screen_complete_proteome_from_uniprot_division.pl $i >> uniprot_archaea.fasta

Maybe you want run the script screen_complete_proteome_from_uniprot_division.pl on each unzipped file ? Then try something along these lines:

 #!/bin/bash
 for i in *dat.gz
 do gunzip $i
 echo $i | perl -slane '$_=~s/\.gz//; # remove the .gz extension from the filename
                        $a=(split /\_/, $_)[2]; # split on _ and extract the third part
                        $a=~/(\w+)\.dat/; $b=$1; # extract all characters before .dat
                         qx(perl screen_complete_proteome_from_uniprot_division.pl $_ > uniprot_$b.fasta)' # execute perl script on unzipped input file and save output in .fasta file
done
ADD REPLY
0
Entering edit mode

Great. Thank you! I will try these...

ADD REPLY
0
Entering edit mode

I'd recommend redirecting stdout of the bash script to a file and executing that file. Running the perl script from a loop will make debugging more difficult.

ADD REPLY
4
Entering edit mode
5.7 years ago
Ram 43k

split/|_/, $_[2]

Split the current line by underscore, pick 3rd value

$a=~/(\w+).dat/; $b=$1

Pick all the alphanumeric characters preceding .dat, assign that to $b

print "perl screen_complete_proteome_from_uniprot_division.pl \$i >> uniprot_".$b.".fasta"' -- -i=$i

Do stuff with the variables obtained above.

perl -slane

Run perldoc perlrun from the command line and read the manual. That is the explanation for each of the -s, -l, -a, -n and -e options

  • -s enables the -- -i=$i variable passing part.
  • -e is "execute this perl stuff that I'm passing as a string", like bash -c or Rscript -e. Instead of processing a file, this makes the command process a command line argument.

    perl -nle is essentially like awk, running the command-passed-as-an-argument per line of input file.

I'm not sure what the significance of the -a is here.

ADD COMMENT
0
Entering edit mode

Great!! Thank you very much Ram. I am running the script, however, it is not giving me what I want. Not sure if something is wrong with the screen_complete_proteome_from_uniprot_division.pl script

Here is what it looks like:

*

#!/usr/bin/env perl
use strict;
use warnings;
use G;
# perl screen_complete_proteome_from_uniprot_division.pl EBML_format.dat
## EMBL_format.dat ex : uniprot_sprot_archaea.dat
my $input = shift;
my %out = &get_fasta($input);
sub get_fasta{
  my $input = $_[0];
  my $tree = readFile($input, -format=>"swiss" );
  my ($dat, $div) = (split /\_/, $input)[1,2];
  $div =~ s/.dat//;
  foreach my $entry ( sort keys %{$tree} ) {
    if( defined $tree->{$entry}->{KW} && $tree->{$entry}->{KW} =~ /Complete\sproteome/ ) {
      next if $tree->{$entry}->{OC} =~ /Tardigrada/ ;
#      next if $tree->{$entry}->{OC} =~ /Nematoda/ ;
#      next if $tree->{$entry}->{OC} =~ /Arthropoda/ ;
      my %fasta;
      my $seq = $tree->{$entry}->{"  "};
      $seq =~ s/\s+//g;
      say $tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div if $tree->{$entry}->{OC} =~ /Metazoa/;
      $fasta{$tree->{$entry}->{LOCUS}->{id}."|".$dat."|".$div} = $seq;
      say to_fasta(%fasta);
    }
  }
}

*

Would you be able to help me figuring it out?

I appreciate your input very very much!

ADD REPLY
0
Entering edit mode

See my comment above.

ADD REPLY
0
Entering edit mode

I'm sorry, I'm not in a place to debug perl code - I've been out of touch with Perl for a while now, and Perl is a difficult-to-debug language to begin with.

ADD REPLY
0
Entering edit mode

Its ok. Thanks very much Ram!!

ADD REPLY

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6