Over the top
So now you know how to subvert perl's hashes to do whatever you want
(check out Tie::Hash::Cannabinol
to see how to get perl stoned). How about subverting its operators? When
does 2+2=5
? When programming with objects, obviously. And
how do we undermine the whole basis of arithmetic? With the
overload
pragma. If you cast your mind back to when I first
started nagging about use
strict
, you might remember that I mentioned that
strict
was a pragmatic module. Pragmatic modules (pragmata)
are a little more special than your average run-of-the-mill modules in
that they manipulate the way in which perl parses and executes your code,
rather than what functionality is available. overload
is
another pragma, which allows you to change the way in which perl does
maths and quoting. Since I have
issues with Imperial units, what happens if you want 1m + 1" =
1.025m
? We can use overload
to change how addition
works, and purge inches forever. Since overloading is something we can
only do to objects, this is obviously part of an object oriented approach. Hence we'll need to
write a perl class which can take numbers with their units and store them
as objects:
package PurgeImperial; use strict; sub new { my ( $class, $value ) = @_; my ( $number, $unit ) = split /\s+/, $value; return unless $unit =~ /^(in|m)$/; if ( $unit eq "in" ){ $number *= 0.0254; $unit = "m" } return bless { number => $number, unit => $unit }, $class; }
To use the module, we simply create objects like my $imperial =
PurgeImperial->new( "6 in" );
with a number and its unit
separated by spaces. Internally, the object silently converts inches to
metres and stores the new value, with its unit "m". Simple. Now for the
overloading. To overload an operator, like addition, we need to use the
overload
pragma, and tell it the names of methods to replace
operators with. Between use strict
and sub new
,
we need to wodge this in:
use overload ( '+' => "add" );
use overload
takes a hash as an argument. Each key/value
pair is an operator to overload, and a method that will be used to
implement it. We then need to write the add
method, which is
capable of adding two of our objects together in the right way, and
produce a suitable new object:
sub add { my ( $object1, $object2, $was_reversed ) = @_; my $class = ref $object1; $object2 = $class->new( $object2 ) unless UNIVERSAL::isa( $object2, $class ); ( $object1, $object2 ) = ( $object2, $object1 ) if $was_reversed; # see explanation below my $sum = $object1->{number} + $object2->{number}; return $class->new( "$sum m" ); }
The add
method is passed three things when it is called
by overload
: the first two are the objects involved in the
addition, the third is a value (true or false) that tells you whether the
objects have been reversed. The reason for this is that the first
argument to an overloaded method is always an object, hence if you
ask perl to do this:
3 + $overloaded_object;
where we are trying to add an object to something that isn't an
object, perl will actually ask overload
to do this:
add( $overloaded_object, 3, "REVERSED!!!" );
or similar. For addition, this (probably) doesn't make any difference, but for subtraction, you must be careful:
$overloaded_object - 3;
is called as:
subtract( $overloaded_object, 3 );
as you would expect. However,
3 - $overloaded_object;
is called as:
subtract( $overloaded_object, 3, "REVERSED!!!!" );
and unless you take note of the trueness of the reversal, you'll end
up with the same result from both, which is unlikely to be what you're
after! Anyway, the method is otherwise simple enough to understand: we
first find out what sort of object(s) we are trying to add (remember that
if someone inherits your class, this may not necessarily be
PurgeImperial
: never hardwire a class name in), by asking
for ref $object1
. Then we ensure that the second object
passed to add
is capable of being added to our object: we do
this by calling the isa()
method, which all objects inherit
from the UNIVERSAL
class (the root class of all perl
objects). This method returns TRUE if an object is a member of our class
(or of its possible subclasses). If the second object is a
PurgeImperial
, then we carry on, otherwise we attempt to
convert it by calling new()
. Then we (pointlessly) unreverse
the objects and add their innards together appropriately. Finally, we
create a new object of the appropriate sort $class->new( "$sum
m" )
, thereby forcing the answer to be in metres. Hahaha. So let's
see what the answer is:
#!/usr/bin/perl use strict; use PurgeImperial; my $x = PurgeImperial->new( "1 in" ); my $y = PurgeImperial->new( "1 m" ); my $z = $a + $b; print "$z\n";
PurgeImperialOperation `""': no method found, argument in overloaded package PurgeImperial
Ah. What have we done wrong? Well, objects aren't inherently printable things: the nearest we can get to printing an object is:
use Data::Dumper; print Dumper \$z;
$VAR1 = \bless( {'unit' => 'm', 'number' => '1.0254'}, 'PurgeImperial');
Or is it? In fact, we can overload
the quote operator
too, so that when we try to print "$z\n"
our object returns
something printable and useful. First we need to change the overloading
to reflect this:
use overload ( '+' => "add", q("") => "quote" );
The ""
is the bizarre name of the quote operator, which
itself needs quoting or strict
will whinge, so we use the
q()
operator. Ugh. Just do it, OK. We then need to create a
quote
method:
sub quote { my ( $object ) = @_; return join " ", $object->{number}, $object->{unit}; }
If we now run the print "$z\n"
code, we get:
1.0254 m
In fact, we can even do this:
print PurgeImperial->new( "12 in" );
0.3048 m
Hahaha. I have made perl imperial-phobic: it can't even say "inch"
anymore. Using the overload
pragma can be extremely useful,
as it adds to the opaqueness and simplicity of your class API: if
$x
and $y
are overloaded objects, it's far more
intuitive to type $x + $y
than $c = $a->add( $b
)
or similar. The overloading on quoting is also useful for
similar reasons: it's much easier to understand 1.0254 m
than the output of Data::Dumper
(particularly if the
object's innards are gnarly). Besides, spilling an object's guts is bad
OO form. Unless it's absolutely necessary ☺
It would be fairly easy to implement some sort of unit/number
association with techniques similar to the above. I gave it a go for a
laugh a while back, when my perl skills weren't very good, but I
eventually got bored and confused when I tried to work out what the units
of $answer
in something like:
my $answer = ( Units->new( "23 mWb m-2" ) ** 2 ) / ( Units->new( "35 cm ns-1" ) );
should be, and worrying about whether radians were really
dimensionless, whether you could take the logarithm of a unit-bearing
number, and what 1 Bq * 1 Hz
was? 1 BqHz
?
1 s−2
? 1
Bq−2
? Ugh. I'll leave it to someone more
competent.
The comparison operators can also be overloaded (so you can make 6 in
< 3 m despite the fact that 6 > 3). Unfortunately there are rather
a lot of these operators (< > <= >= == != lt gt le ge
eq ne
), but fortunately, for most simple cases, you only need to
overload <=>
or
cmp
(the comparison operators) and perl will work out
the rest from that (since knowing where something ought to come in an
ordered list means also knowing the answer to whether two objects are
equal, etc.):
use overload ( "<=>" => "compare" ); sub compare { my ( $object1, $object2, $was_reversed ) = @_; ( $object1, $object2 ) = ( $object2, $object1 ) if $was_reversed; # see explanation below return $object1->{number} <=> $object2->{number}; # or explicitly work out the 0, +1 and -1 for complex cases }
Or similar. Likewise, overload
can bootstrap
$a++
as $a = $a + 1
, so if your class's
add
method can deal with addition of a pure number to the
object, then you don't have to overload ++
too.
A final neat trick you can do with overloading is to overload perl's
interpretation of string and number constants, so that when we come
across the string '4 in', it will automatically be converted to an object
of the PurgeImperial
class, without even having to bother
with all that tiresome new()
stuff. If we add this to our
class:
my %handlers = ( q => sub { print "doing $_[0]"; return __PACKAGE__->new( $_[0] ) || $_[1]; } ); sub import { overload::constant( %handlers ) if $_[ 1 ] eq ':constants'; } sub unimport { overload::remove_constant( q => undef ); }
the when perl finds strings or numbers in a script, it calls the
overload
module to deal with them. To use this
functionality, we just need to initialise the the
overload::constants
function with a hash. This hash (called
%handlers
here) is keyed on the name of the constant found
(integer
or float
for numbers, q
for strings), and the values are coderefs that will be called when perl
comes across a constant of the appropriate type.
The coderef will be called with three arguments: the literal data that
the parser found in your code (4 in), the way it would normally interpret
this (as the string "4 in"), and the context in which it was found
(q
for single quoted strings, qq
for double
quoted, etc.). All we need to do is return our own
interpretation of the constant $_[0]
, to replace perl's
default one, $_[1]
: we do this simply by calling
__PACKAGE__->new( $_[0])
, which calls the constructor in
the current package (here PurgeImperial
, but it could be a
subclass: __PACKAGE__
contains whatever the current package
name is). If we can't coerce $_[0]
into an object of our
type (if the constructor returns undef
), then we simply
return the default interpretation, $_[1]
. Voilá.
To actually get this requires one final hoop to be jumped: we could
just initialise the overload::constant
function when our
module is loaded. However, this is a little rude, as this involves some
quite unexpected data manipulation, and we want to make sure that our
user really wants this functionality. To make it optional, we write a
custom import subroutine. Remember that when someone uses a module:
use modulename argument1, argument2, ...;
perl calls the import()
subroutine in the module with the
arguments passed to use
...
module::import( modulename, argument1, argument2, ...)
Usually (if we inherit from Exporter.pm
) this aliases
functions into your namespace so you can call find()
rather
than File::Find::find()
, but we can write our own custom
import()
very easily. We do this here: only if
PurgeImperial
is called like this:
use PurgeImperial ':constants';
will our custom import
function prime
overload::constant
with our funny handlers. We also define
an unimport()
subroutine, so if someone calls
no PurgeImperial;
somewhere in their script, it will politely remove this functionality.
Anyway, for more details of overloading, perldoc
overload
: you'll find you can overload pretty much any of perl's
built in operators, from sin
to <<
to
""
. I think the only exceptions are the logical operators
&&
and ||
. Go and have a play, and feel
free to mess with
the PurgeImperial class (shoddy though it is).
Dispatching with class
Somewhat akin to overloading is the fabulous bag of tricks called
Class::Multimethods
. The reason that people go funny about
objects and classes is that they allow you to have many subroutines
called, for example, feed
. You can have:
package Cat; sub new { return bless { stomach => 'nothing' }, $_[0] } sub feed { $_[0]->{ stomach } = $_[1] } sub last_meal { $_[0]->{ stomach } } package Dog; sub new { return bless { stomach => 'nothing' }, $_[0] } sub feed { $_[0]->{ stomach } = $_[1] } sub last_meal { $_[0]->{ stomach } } package main; my $kitty = new Cat; feed $kitty 'mice'; print "Kitty last ate ", $kitty->last_meal, "\n"; my $pooch = new Dog; feed $pooch 'mechanically recovered meat'; print "Pooch last ate ", $pooch->last_meal, "\n";
Kitty last ate mice Pooch last ate mechanically recovered meat
Now, you would probably not write those feed
statements
like that: you'd probably say $kitty->feed( 'mice' );
.
However, this 'indirect object' syntax is perfectly valid, and serves to
show you that what looks like a simple subroutine call to
feed
is dispatched polymorphically, i.e.
different versions of feed
are called depending on the class
to which the first argument ($kitty
or $pooch
)
belongs.
Polymorphic dispatch is spendidly handy, as you have no doubt seen.
However, what happens if we decide that what we really want to do is feed
the $kitty
a Mouse
object, rather than
a string 'mouse'
? We want the code to be bright enough to
call a different subroutine/method based not just on the class of the
first argument to feed
but also on the classes of any other
objects in the argument list.
$kitty->feed( 'mouse' ); # should call a different method to $kitty->feed( Mouse->new );
This is called 'multimethod dispatch', and can be achieved through the
use of the wonderful Class::Multimethods
. To use
Class::Multimethods
, you need to define many different
versions of the same method, each with a different signature (prototype,
argument list, choose whichever syntax you prefer) of classes. It is
often useful to define these multimethods in a superclass, from which
your other classes inherit:
package Animal; use Class::Multimethods; # our multimethods consist of a name, a signature of classes, and a coderef multimethod feed => ( 'Cat', 'Mouse' ) => sub { $_[0]->{ stomach } = $_[1] }; # note that these are subroutine calls to the 'multimethod' # function exported from Class::Multimethods, so they must be # terminated by a semicolon multimethod feed => ( 'Dog', 'Mouse' ) => sub { warn "Unlikely"; return }; multimethod feed => ( 'Cat', 'Cat' ) => sub { warn "Cannibalism!"; return }; # this one allows a cat to eat some string multimethod feed => ( 'Cat', '$' ) => sub { $_[0]->{ stomach } = $_[1] }; multimethod feed => ( 'Dog', 'Cat' ) => sub { $_[0]->{ stomach } = $_[1]; $_[0]->{ felicide }++ }; package Cat; our @ISA = qw ( Animal ); # inherit multimethod definitions from superclass 'Animal' use overload q("") => sub { 'a smug Cat object' }; sub new { return bless { stomach => 'nothing' }, $_[0] } sub last_meal { $_[0]->{ stomach } } package Dog; our @ISA = qw ( Animal ); use overload q("") => sub { 'a slobbery Dog object' }; sub new { return bless { stomach => 'nothing' }, $_[0] } sub last_meal { $_[0]->{ stomach } } package Mouse; our @ISA = qw ( Animal ); use overload q("") => sub { 'a crunchy Mouse object' }; sub new { return bless { stomach => 'nothing' }, $_[0] } sub last_meal { $_[0]->{ stomach } } package main; my $kitty = new Cat; $kitty->feed('mice'); print "Kitty last ate ", $kitty->last_meal, "\n"; my $micky = new Mouse; $kitty->feed( $micky ); print "Kitty last ate ", $kitty->last_meal, "\n"; my $pooch = new Dog; $pooch->feed( $kitty ); print "Pooch, $pooch, last ate ", $pooch->last_meal, ", which last ate ", $pooch->last_meal->last_meal, "\n";
Kitty last ate mice Kitty last ate a crunchy Mouse object Pooch, a slobbery Dog object, last ate a smug Cat object, which last ate a crunchy Mouse object
We define lots of different variation on a theme, a multimethod for each eventuality. In theory we should create twelve (dog eats dog/cat/mouse/string, cat eats dog/cat/mouse/string, mouse eats dog/cat/mouse/string), but I am not that patient.
The cat is still capable of eating a mouse
string, which
we ensure by defining a multimethod that has $
as its second
argument, but our code also ensures that it can eat Mouse
objects and warn if it is being forced into cannibalism. In addition to
the pseudoclass $
, there is an equivalent pseudoclass
#
that you can use when (unquoted) numbers are dispatched to
methods calls:
$kitty->feed( 3.142 ); # pie makes kitties puke package Animal; multimethod feed => ( 'Cat', '#' ) => sub { warn "Kitties don't like pie"; $_[0]->{ stomach } = 'nothing'; };
Note that the code also uses overloading to ensure that the objects
print out something nice when put in stringifying context (as provided by
double-quotes and/or print
).
Using tie
, overload
and
Class::Multimethods
allows you to pretty much do what you
like to ensure that objects DWYM rather than what perl thinks they ought
to do. A word of warning though: these techniques are very useful, but
they do tend to slow perl down somewhat, so use them judiciously.
Sort it out
For want of something more relevant to round this installment off
with, I have to resort to the random bag of tricks cop-out instead.
First, constants, wantarray
and prototypes, second, sorting
and the hallowed Schwartzian Transform.
Prototypes and wantarray
are a way of simulating perl
functions, specifically, allowing you to write subroutines that behave
differently in list and scalar context, and force list or scalar context
on their arguments. Now, prototypes are rather broken in Perl, so if you
can think up a better way of doing things without them, then do that
instead. Prototypes look something like this:
sub force_scalar ($) { my $arg = shift; print "argument is $arg\n"; } my @wood = qw( phellum phellogen cortex phloem cambium xylem pith ); force_scalar @wood;
argument is 7
The ($)
between the sub
's name and block
tell perl that the subroutine expects a single scalar argument, and if it
receives something else (like an array), it should force it into
scalar
context. More complex prototypes are possible, but
herein lies a whole world of hurt: prototypes fail to DWIM most of the
time, so be careful when you use them! Note also that if you define a
subroutine before you call it, you don't need to use parentheses, hence
force_scalar @wood
, not force_scalar( @wood )
.
However, it doesn't hurt, and may save you from some unpleasantry. There
are two main uses of prototypes that are quite useful: the first is to
create constant subroutines:
sub constant { return 666; } # no prototype print constant + 111;
666
sub constant () { return 666; } # void prototype print constant + 111;
777
In the first case, there is no prototype. The unfortunate consequence
of this is that print constant + 111
is parsed as:
print( constant (+111) );
The constant
subroutine does not need an argument, but in
the absence of a void prototype, it doesn't know this, so it grabs the
signed integer +111
anyway, and then throws it away, hence
the strange output. In the second case, we explicitly tell perl that
constant
takes no arguments with a void prototype
()
and then it parses as we mean:
print( constant() + 111 );
The second use of prototypes that isn't harmful is to pass things by
reference without having to backwhack them. If you want to write a
function that does to hashes what push
does for arrays, you
can use this:
sub hpush(\%@) { my $href = shift; # note the $href, not a %hash while ( my ($k, $v) = splice( @_, 0, 2 ) ) # use splice to 'pop2', as it were { $href->{$k} = $v; } } hpush ( %male, spermatium => "Teliomycetidae", "pollen nuclei" => "Spermatophyta", sperm => "Metazoa", );
The hpush
subroutine takes a prototype of (\%
@)
. The first item (\%
) tells perl to expect a hash
as the first argument (a real, literal hash, with a %
and
everything, not just a list with fat commas =>
), and to
pass that hash by reference to the subroutine. The second item
(@
) tells it to expect a list as the second, third, fourth,
etc, arguments. This allows you to call the subroutine as
hpush( %hash, LIST )
, rather than as hpush( \%hash,
LIST )
, so mimicking push( @array, LIST )
, rather
than the hypothetical and ugly push( \@array, LIST )
.
If you remember that prototypes are for making functions look and behave like Perl built-ins and you'll be fine. Expect anything more from them, and it'll all end in tears.
Prototypes' mirror image is wantarray
, which allows you
to respond appropriately and polymorphically to being called in list or
scalar context:
my $scalar = latinize( "maidenhair tree" ); print "Scalar: $scalar\n"; my @list = latinize( "monkey puzzle tree" ); print "List:\n"; print "\t$_\n" for @list; sub latinize { my $name = shift; my %dictionary = ( "maidenhair tree" => [ "Ginkgo", "biloba", "L." ], "monkey puzzle tree" => [ "Araucaria", "araucara", "Mol.(Koch)" ], ); return wantarray ? @{ $dictionary{ $name } } : join " ", @{ $dictionary{$name} }[ 0, 1 ]; }
Scalar: Ginkgo biloba List: Araucaria araucara Mol.(Koch)
wantarray
is TRUE if the sub is called in list context,
and FALSE if called in scalar context. It's often used in this form at
the end of a subroutine:
return wantarray ? LIST_VERSION : SCALAR_VERSION;
In the code above, the subroutine returns a Latin binomial (Genus
species) in scalar context, and a list of Genus, species, author in list
context. It is quite possible to go overboard on wantarray
,
and have subroutines that always respond in different ways to different
contexts. Beware such overhelpfulness: it's not always as helpful as you
might think, if for no other reason than making the docs twice as long as
they might need to be ☺
Enough on user-friendliness, how about something on efficiency? A
common problem when performing sorts is that the internal implementation
of sort
has to access the data on which the sort is performed several times for each item in
the unsorted list. If accessing this data is time consuming (if you have
to use a system call for example), the sort
can take rather
a long time. The canonical example is trying to sort a list of filenames
by their age:
#!/usr/bin/perl use strict; my @filenames = glob "data/bacteria/sequences/*.seq"; # glob calls a C-shell style globbing function sub brute_force { -M $a <=> -M $b } @sorted = sort brute_force @filenames;
There are two famous ways to get around this expensive look-up, the Orcish manoeuvre and the Schwartzian transform. Both squirrel away the key you are sorting on so that you only need to look it up once. The Orcish manoeuvre is the simplest to understand:
#!/usr/bin/perl use strict; my @filenames = glob "data/bacteria/sequences/*.seq"; { my %cached; # lexically scoped in a bare block to keep things tidy sub orcish { ( $cached{$a} ||= -M $a ) <=> ( $cached{$b} ||= -M $b ) } } @sorted = sort orcish @filenames;
In the Orcish manoeuvre, we check to see if we have already looked up
the value of -M $a
or -M $b
by requesting the
value of $cached{$a}
or $cached{$b}
. If we
haven't previously cached them, we store the age in the cache, so that
next time we don't have to do the expensive system call to get the age.
If we have previously cached them, we just get the value from the cache
anyway. The name is derived from the ||=
, which could be
read Or-Cache if you like.
You can compare the Orcish and brute force techniques using the
Benchmark
module:
#!/usr/bin/perl use strict; use Benchmark; my @filenames = glob "data/bacteria/sequences/*.seq"; sub brute_force { -M $a <=> -M $b } { my %cached; # lexically scoped in a bare block to keep things tidy sub orcish { ( $cached{$a} ||= -M $a ) <=> ( $cached{$b} ||= -M $b ) } } timethese ( 10000, # 10 000 iterations of each { "Brute force" => sub { my @sorted = sort brute_force @filenames }, "Orcish" => sub { my @sorted = sort orcish @filenames }, } );
Benchmark: timing 10000 iterations of Brute force, Orcish... Brute force: 133 wallclock secs (30.02 usr + 102.50 sys = 132.52 CPU) @ 75.46/s (n=10000) Orcish: 1 wallclock secs ( 1.43 usr + 0.00 sys = 1.43 CPU) @ 6983.24/s (n=10000)
Benchmark
's function timethese()
takes a
number of iterations (10 000 here) and a hashref of name =>
coderef
pairs as arguments, and times how long the coderefs take
to run. The Orcish manoeuvre is much faster.
The Schwartzian transform (named after Randal L. Schwartz) is even faster, but rather more complex at first glance:
#!/usr/bin/perl use strict; my @filenames = glob "data/bacteria/sequences/*.seq"; my @sorted = map { $_->[0] # 4. Construct the list of names by extracting # them from the sorted arrayrefs } ( sort { $a->[1] <=> $b->[1] # 3. Sort this list of arrayrefs based on their file ages } ( map { [$_, -M] # 2. Convert them into list of [ filename, age ] arrayrefs } ( @filenames # 1. Take a list of filenames ) ) ); print "$_\n" for @sorted;
Take it backwards and it all makes sense: first we take
@filenames
, and create a list of [ filename, age
]
arrayrefs using map
. Then we sort
based on the age ( $a->[1] <=> $b->[0]
), to
create an ordered list of these [ filename, age ]
arrayrefs.
Finally, we grab the filenames $_->[0]
and create a final
list of sorted filenames using map
again. This
map-sort-map
is very elegant, and very fast (the array
lookups are quicker than the hash lookups in the Orcish manoeuvre), and
definitely a technique worth keeping in your code toolbox.