Question

How To Print A Nested Perl Script Within Another Perl Script?

0

Entering edit mode

12.2 years ago

anin.gregory ▴ 110

I am trying to write a perl script to make a bunch of new perl scripts but as it is it won't allow me to use variable/arrays/etc in a print "".

The error I get back says I need to define them even though they are within in the print "". Is there away around this?

Below is my preliminary script:

#!/usr/bin/perl
use warnings;
use strict;

my $idfile = $ARGV[0];
open (IDFILE,'<',$idfile)
or die "Could not open $idfile \n";

my $outfile_name; 

##CUT##
my $outfile = $outfile_name."pl"; # warning: undefined value in concatenation or string
open (OUTFILE, '>', $outfile)  
or die "Could not open $outfile \n"; # this will make a single file named ".pl" with all code inside
##END CUT##

while (my $line = <IDFILE>) {
    chomp ($line);
    if ($line =~ /(T4-GC_[0-9]+)/) {
        my $outfile_name = "Pull_".$line; # this masks previous definition of $outfile_name and has no effect

 ## I think you want to move ##CUT## - ##END CUT## here

        my $script = "
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SearchIO;
use Bio::SeqIO;

my @ARGV = glob("*.fa"); ## why use ARGV? it will work, but ARGV is a special variable containing the arguments 

## Your search strategy is very inefficient! Given the relative numbers of identifiers in IDFILE and all entries in fa files,
## can you devise a more efficient search strategy than N*M? (This is your homework: Assume #IDFILE (N) < #All Fasta entries (M))

foreach my $fil (@ARGV) {
    my $seqio  = Bio::SeqIO->new(-format => 'fasta', -file  => $fil);
        while (my $seqobj = $seqio->next_seq) {
            my $seqid = $seqobj->display_id;
             $fil =~ /([A-Z]+[0-9]+)/;
         my $phage_name = $1;
         my $id = $seqid."|".$phage_name;
             my $nuc = $seqobj->seq();
             if ($seqid =~ /$line/) {
                print ">$id\n$nuc\n";
         }
}
}"
    print OUTFILE $script;
## close OUTFILE forgotten.
    }


}

perl output • 6.5k views

ADD COMMENT • link updated 12.2 years ago by Neilfws 49k • written 12.2 years ago by anin.gregory ▴ 110

1

Entering edit mode

off-topic, simple perl questions

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

Deletion needs to be reserved to posts that have absolutely nothing to do with bioinformatics. A question that involves bioperl or one that involves any script or program that is meant to analyze bioinformatics data is on topic.

It is possible that the solution to a question is not bioinformatics related that but that should not be grounds for deletion.

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

1

Entering edit mode

Then bring back the close option!

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

To me this question has nothing to with bioinformatics but is a good question for stack-overflow. If it had, any script could be made relevant by simply adding a line "import Bio::*";

ADD REPLY • link 12.2 years ago by Michael 56k

2

Entering edit mode

true, there is really no way around that.

This is how I see it: whoever has an import Bio in their script is a budding bioinformatician. They are one of us, or want to be one of us. This site is written by bioinformaticians for bioinformaticians. Why not help , guide them in the right direction.

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

0

Entering edit mode

I'd do just that, just please put the tools in place. The right response imo to such a question is to close it and to link to the numerous results that explain it.

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

yes I agree - we need to indicate that this question is really a programming question and the author will may get better answers on stackexchange. This would be a great answer to the question, would not even need to be a comment.

But we don't need to preclude someone else to also adding an extra answer. There may be something in the code that is not done properly or there may be an alternative that the OP is not even asking about. Bioinformatics is fundamentally more complicated than programming. The StackExchange binary model is not appropriate to science.

What I don't like about closing is the finality of someone deciding that there is no other opinion that could make this thread better today or in the future.

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

0

Entering edit mode

sorry, whoever undeleted this, if this is not off topic, what then?

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

there is a moderator log (that I will make more visible) that shows moderator actions: http://www.biostars.org/modlog/list/

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

0

Entering edit mode

also I think we should comment to tell people that they should be asking this elsewhere but I think deleting the question especially after it got an answer and upvote only leaves more people unhappy

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

1

Entering edit mode

the upvote for the answer is from me. in fact this is the result of the missing close option.

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

I have added some comments into your original perl code to show some more problems with the code.

ADD REPLY • link 12.2 years ago by Michael 56k

0

Entering edit mode

Thanks! Sorry if it is a code question. I am new to bioinformatics.

ADD REPLY • link 12.2 years ago by anin.gregory ▴ 110

0

Entering edit mode

Reading the code, it's clear that they are trying to solve a bioinformatics problem (matching a list of IDs to FASTA headers). It's just that the problem was not well-defined in the question.

ADD REPLY • link 12.2 years ago by Neilfws 49k

score 3 · Answer 1 · 2013-04-21

First: for the love of whatever it is you pray to, do not go down the road of writing Perl scripts which write out other Perl scripts. It is emphatically not the correct solution to any problem under any circumstances.

Second: there are some basic errors in your code. For example, defining variables but not assigning them a value, which explains "undefined value in concatenation or string". Or adding the suffix "pl" instead of ".pl". However, we will not dwell on these because as stated above, it is emphatically not the correct solution to any problem under any circumstances.

The first step in asking a question is to define clearly the problem. I have stared at your code for a while and so far as I can tell, you are trying to do the following:

Open a file which contains IDs of some kind, one per line, which begin "T4-GC_" followed by digits
Create an output file name based on each ID
Write out Perl code to the output file.

It appears that the code in each output file is supposed to:

Open one or more FASTA files
Parse the sequences into Bioperl sequence objects
Compare the FASTA IDs to the IDs in your input file
If they match, print out the sequence in FASTA format with a new ID based on the original FASTA header

There are more errors in that part of the code which ensure that it will not work but again, we will not dwell on them because as we now know, it is emphatically not the correct solution to any problem under any circumstances.

What you want is this: one Perl script which, given a file of IDs and a file of sequences, does the matching and writes out the sequences where the IDs match the header. You do not need a script for each ID and you do not need to supply all FASTA files as an argument to the script. Given a directory of FASTA files ending in ".fa", you should supply the files to your script on the command line using e.g. find:

find ./ -name "*.fa" -exec perl MyScript.pl {} \;

Hope this helps.

score 2 · Answer 2 · 2013-04-21

Chances are that whenever you are creating perl scripts by a perl script, you are doing it wrong! That means you are most likely making your life more complicated and your code becomes obfuscated, hard to debug and manage by generating perl scripts. There are several good use-cases for generating code via a perl script (e.g. HTML, Javascript code, glue-code generation to C-libraries, generating perl module code to create Object relational mapping to a database). None of these seems to be the case here. Also, I don't see what your generated script would do other than what a normal perl program will be able to do with greater ease using standard control structures. So, the other answer is formally correct, but you most likely don't want to do it that way. Unless you can provide a rationale for doing this.

I also recommend to ask this question on stackexchange or perl-monks with a different focus: "I wish to solve this or that problem, what is, or is my way the right way to do this?" Without jumping to conclusions already in the question. I recommend to use a pure programming oriented forum, because your problem has in my view only aspects of a general programming question and the relation to bioinformatics is irrelevant to its solution.

score 1 · Answer 3 · 2013-04-21

Try the following segment:

        my $script = <<'END';
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SearchIO;
use Bio::SeqIO;

my @files = glob("*.fa");

foreach my $fil (@files) {
    my $seqio  = Bio::SeqIO->new(-format => 'fasta', -file  => $fil);
        while (my $seqobj = $seqio->next_seq) {
        my $seqid = $seqobj->display_id;
        $fil =~ /([A-Z]+[0-9]+)/;
        my $phage_name = $1;
        my $id = $seqid."|".$phage_name;
        my $nuc = $seqobj->seq();
        if ($seqid =~ /$line/) {
                print ">$id\n$nuc\n";
        }
}
}
END
    $script =~ s/\$line/$line/;
    print OUTFILE $script;

The above uses a here document with single-quotes to prevent interpolation, which avoids perl complaints. However, from what I've noticed, one variable in a regex needs interpolation, so the substitution before printing the script out to a file takes care of that.

Hope this helps!

ps Using lexically-scoped variables (my) for file handles is recommended. Also, replaced @ARGV with @files as there's no need to use Perl's global command-line argument array for what your scripts are doing.