Modules and Packages

A module, simply put, is a set of subroutines and variables that you put in a file so that they can be used and re-used by many programs. There are “traditional modules” and “object-oriented modules”—in this tutorial, we’ll concentrate on the traditional one.

A Sample Module

Let’s construct a few subroutines to determine basic statistics. These will use a subset of the POD commenting style:

use strict;

=begin comment

Find the average of an array, passed by
reference.  If the array is empty,
return zero.

=end comment

=cut

sub average
{
    my $array_ref = shift;
    my $sum = 0;
    my $result = 0;
    my $n = scalar @{$array_ref};
    my $item;
    foreach $item (@{$array_ref})
    {
        $sum += $item;
    }
    if ($n > 0)
    {
        $result = $sum / $n;
    }
    return $result;
}

=begin comment

Calculate the maximum and minimum values of a set of
numbers passed as an array reference.

If there are zero numbers, return zero for both maximum and
minimum.

=end comment

=cut

sub min_max
{
    my $array_ref = shift;
    my $n = scalar @{$array_ref};
    my $min;
    my $max;
    my $item;
    if ($n == 0)
    {
            $min = $max = 0;
    }
    else
    {
        $min = $max = ${$array_ref}[0];
        foreach $item (@{$array_ref})
        {
            $min = $item if ($item < $min);
            $max = $item if ($item > $max);
        }
    }
    return ($min, $max);
}

=begin comment

Calculate the variance of a set of numbers
with n items passed as an array reference.

Algorithm: 
    Calculate the sum of all the numbers (sum)
    Calculate the sum of the squares of the numbers (sum_sq)
    Variance is (n * sum_sq - (sum**2)) / (n * (n-1) )

    If n <= 1, return zero.

=end comment

=cut

sub variance
{
    my $array_ref = shift;
    my $n = scalar(@{$array_ref});
    my $result = 0;
    my $item;
    my $sum = 0;
    my $sum_sq = 0;
    my $n = scalar @{$array_ref};
    foreach $item (@{$array_ref})
    {
        $sum += $item;
        $sum_sq += $item*$item;
    }
    if ($n > 1)
    {
        $result = (($n * $sum_sq) - $sum**2)/($n*($n-1));
    }

    return $result;
}

If you want to use these routines in several different programs, you could copy and paste them into each one. This is not a good solution, though; if you decide to add or modify a subroutine, you have to change it in all the files. Instead, you can put them in a file with a .pm extension, and one crucial addition. At the end of the file, you must put:

You need to add this because, when you use a module, Perl will compile it, and the result of the compilation must be “true.” After adding the line, save the code in a file named Statistics.pm. I know I’ve warned you about using lowercase only for file names, but the convention for Perl modules is to uppercase the first letter.

Now that you have created the module file, you can use it as in the following test file:

#!/usr/bin/perl
use strict;
use Statistics;

my @data = ( 5, 22, 4, 13 );
my @data2 = ( 11, 10, 12 );
my $avg = average(\@data);
my $var = variance(\@data);
my ($small, $big) = min_max(\@data);
print "Average of first data set is $avg\n";
print "Variance is $var\n";
print "Range is $small to $big\n";

print "-" x 40, "\n";

$avg = average( \@data2 );
$var = variance( \@data2 );
($small, $big) = min_max(\@data2);
print "Average of second data set is $avg\n";
print "Variance is $var\n";
print "Range is $small to $big\n";

Namespaces and Packages

A namespace is an area where Perl keeps a list of all the names of variables and subroutines that are available to a program. Unless you specify otherwise, all your variables and subroutines go into the “main” namespace.

This is all well and good, but there’s a problem—subroutine names like average are quite common. If you are using a lot of people’s modules, there’s a distinct possibility of name collision if you just dump all the names into one namespace. To avoid this, you make a module part of a package, and that sets up a new namespace. The rule in Perl is that the package name must match the module name, so we have to add this to the module:

Now the subroutines average, min_max, and variance belong to the Statistics package instead of the default package (named main). If you run the test program again, you will get this error message:

Undefined subroutine &main::average called at stats.pl line 7.

To fix this, you must now specify the namespace as well as the subroutine name (and do this throughout the program):

my $avg = Statistics::average(\@data);
my $var = Statistics::variance(\@data);
# etc.

This solves the problem. If you were to use another package, say, BuildingCode, which also had an variance subroutine (which finds out how much a variance for a certain type of construction project costs), you could distinguish them:

use Statistics;
use BuildingCode;

my @data = (2, 6, 13);
my $number = Statistics::variance( \@data );
my $cost = BuildingCode::variance( "electrical" );

Too Much of a Good Thing

The problem is now solved, but at the expense of having to qualify every subroutine call. Luckily, it is possible to import names from a package’s namespace into your own—if the package lets you. To allow the export of all your subroutines, add this to the module:

package Statistics;
require Exporter;
our @ISA = ("Exporter");
our @EXPORT_OK = qw(average variance min_max);

use Statistics qw( average variance min_max );

Note: even if a package says the export is not OK, you can always access it directly with the fully qualified name. The @EXPORT_OK gives the names of subroutines that can be imported without being fully qualified.

The person writing a package can also specify that certain names are exported whenever someone uses the package. For example, if you put this in the module:

package Statistics;
require Exporter;
our @ISA = ("Exporter");
our @EXPORT = qw(average min_max);  # export by default
our @EXPORT_OK = qw(variance);      # export on request

Combining @EXPORT and @EXPORT_OK

Given the preceding setup, if a program is only going to use the average and min_max functions, it can just use those two functions; they were exported by default:

#!/usr/bin/perl;
use Statistics;
my @data = (1, 20, 3);
my $avg = average(\@data);
my ($small,$big) = min_max(\@data);

If the program wants to use variance as well, it can not do the following, because only the items in the qw list get imported.

#!/usr/bin/perl;
use Statistics qw(variance);        # imports variance ONLY
my @data = (1, 20, 3);
my $var = variance(\@data);         # this is ok...
my $avg = average(\@data);          # but this now causes an error
my ($small,$big) = min_max(\@data);

How do you get the default exports as well as the specific one that you wanted? Add :DEFAULT to the import list:

#!/usr/bin/perl;
use Statistics qw(:DEFAULT variance);
my @data = (1, 20, 3);
my $var = variance(\@data);         # this is ok...
my $avg = average(\@data);          # ...and this works also
my ($small,$big) = min_max(\@data); # ...as does this.

Modules and Packages

A Sample Module

Namespaces and Packages

Too Much of a Good Thing

Combining @EXPORT and @EXPORT_OK

Combining `@EXPORT` and `@EXPORT_OK`