Can I Retrieve A Docstring Equivalent From A Perl Script, Using Python?
6
3
Entering edit mode
13.4 years ago

I've developed a toolkit in Python (with a basic command line interface) that allows you to drop Python scripts into a directory as plug-ins to add additional functionality. It picks up information on the scripts using this code (which can then be displayed to the user):

for plugin in plugins:
    pmodule = __import__("toolkit.plugins." + plugin)
    info = getattr(toolkit.plugins, plugin).__doc__
    plugins_information.append(info)

I have also added the option to drop Perl scripts into the directory as plugins, which I execute using the subprocess module, but I was wondering (other than reading a specific line from the file manually), if there was a way I could retrieve the information from the Perl script in a manner similar to that above? I assume using some sort of perlpod parser?

The Perl script is documentated as follows:

=head1 NAME

perl_test

=head1 SYNOPSIS

   perl_test.pl

=head1 DESCRIPTION

   This is a test script implemented in Perl.

=cut

I want to pull out the DESCRIPTION. Any thoughts?

perl python • 5.2k views
ADD COMMENT
3
Entering edit mode
13.4 years ago
Neilfws 49k

My thought is that trying to call one language from another is, in general, a really bad idea when it comes to maintaining code.

I once wrote a web application using a PHP framework, that called some Perl scripts using a library called PECL. It was the biggest software mistake of my career and caused all kinds of nightmares, in terms of maintenance and sharing the code with other people.

Where possible, I recommend developing one application in one language.

ADD COMMENT
0
Entering edit mode

Thanks Neil. Yes, I would rather code completely in Python, but some things seem to be better implemented (strangely from a performance perspective) in Perl.

The scripts are stand-alone and simply drop into the plugin directory. They are called with arguments and then any output from them is parsed back to the toolkit.

I essentially want to make an easily expandable pipeline where I can run through an input configuration file that executes the plugin scripts one at a time, in order to integrate my research methodology, rather than having to run separate scripts left right and center.

ADD REPLY
3
Entering edit mode
13.4 years ago

Hi @gawbul

There seem to be some minor things to improve in your code. Eg: the done = 0 condition never gets changed so it is not needed. Also, in the case where no documentation is found, the function returns nothing. Here is another option:

EDIT: There where major flaws in the previous code. Here is a complete script that should do what you want. There is one major assumption, almost like the code suggested by Istvan. The DESCRIPTION part has to be followed by a line beginning with the equal sign (=).

This is a pretty big assumption. The format itself seems to make it hard to find a generalization that would work on any Perl script written by a variety of people. The tricky bit is, as there is no markup to signify the end of the DOCUMENTATION section, it is hard to define exactly what constitutes its end. What if there is no more equal sign? What if there is no blank line before the beginning of the code? And what about the possible presence of multiple paragraphs?

If you intend to use it only for that particular file you need, or plan to manually change the format of all Perl files you include in your package so that they satisfy this assumption, then it should not be a problem. I would consider adding a =head1 DESCRIPTION_END marker at the end of the documentation section, just to make it conform.

#!/usr/bin/python

import sys
import re

filename = sys.argv[1]
output = sys.argv[2]

def get_perl_info(filename):
    """Get lines containing '=head1 DESCRIPTION' in Perl scripts
    """
    doc = []
    begun = False
    for l in (x.strip() for x in open(filename).readlines()):
        if re.match("^\=head1\s+DESCRIPTION", l):
            begun = True
        elif begun == True and re.match("^\=", l):
            return doc
        elif begun == True:
            doc.append(l)
    return ["No documentation found in file: " + filename]

with open(output, "w") as f:
    for l in get_perl_info(filename):
        f.write(l + "\n")

You can run the script by first turning it into an executable (Linux):

chmod +x get_perl_info.py

and then

get_perl_info.py perl_code.pl extracted_doc.txt

NOTE: I also removed the second script since it could not extract the information correctly from the format that you specified.

Cheers

ADD COMMENT
2
Entering edit mode
13.4 years ago

How are your Perl modules documented?

You should write the documentation with the POD syntax as explained here and here, then you can access it with the Pod::Autopod module

edit: use Pod::Autopod

example:

$: use Pod::Autopod;
$: new Pod::Autopod(readfile=>'Foo.pm', writefile=>'Foo2.pm');
$: # reading Foo.pm and writing Foo2.pm but with pod

$: my $ap = new Pod::Autopod(readfile=>'Foo.pm');
$: print $ap->getPod();

=head1 NAME

=head1 METHODS

=cut
ADD COMMENT
2
Entering edit mode

You could run pod2text (or pod2html) using subprocess, and then read the output: http://perldoc.perl.org/pod2text.html

ADD REPLY
1
Entering edit mode

What about Pod::Autopod?

ADD REPLY
0
Entering edit mode

Thanks giovanni. I know about POD and I am documenting my Perl scripts in that fashion and using docstrings and Sphinx in Python.

Basically my toolkit allows you to view a list of the plugins scripts available and pulls out the doc docstring for Python scripts to give some information on what the script does.

I need it to do something similar (using Python) for the Perl scripts too.

ADD REPLY
0
Entering edit mode

Thanks Brad, I was thinking that, but thought coding a quick method of my own might be quicker? See below! It breaks out, when it picks up the text from the first perldoc DESCRIPTION line (or more accurately, the first one with an associated text description).

ADD REPLY
2
Entering edit mode
13.4 years ago

If all you need is cut the file between markers you can do that in a simple way like so:

from itertools import takewhile, dropwhile

stream = open('myprogram.pl')
desc =  dropwhile(lambda x: 'DESCRIPTION' not in x, stream)
desc.next()
desc =  takewhile(lambda x: '=cut' not in x, desc)

print ''.join(desc)
ADD COMMENT
1
Entering edit mode

Nice thing to learn :) This makes strong assumption, however, that the =cut tag is ALWAYS going to be existent and right next to the DESCRIPTION part. Cheers

ADD REPLY
2
Entering edit mode
13.4 years ago
Rvosa ▴ 580

The easiest thing to do is to pipe the output of the podselect utility into your python script. On the command line, you would simply do:

podselect -section DESCRIPTION infile.pl

...which writes the DESCRIPTION section of the file infile.pl to standard out. I don't know how to do the equivalent of backticks in python to capture that but I assume it's easy to do.

ADD COMMENT
1
Entering edit mode
13.4 years ago

I guess I could do this?

def get_perl_info(filename):
    fileh = open(filename, "r")
    done = 0
    while not done:
        line = fileh.readline()
        if not line:
            break
        elif re.match("^\=head1\s+DESCRIPTION.*?$", line):
            while not done:
                line = fileh.readline()
                if not line:
                    break
                elif not line.strip() == "":
                    desc = line.strip()
                    fileh.close()
                    return desc
ADD COMMENT

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6