Line noise
Earlier in the lessons, we covered how to write comparisons:
if ( $string eq "Something we're interested in" ) { print "Ha, ha!"; } else { print "Boring"; }
What happens if there's more than one thing you're interested in
though? Writing a gigantic if elsif else
statement will make
your head spin, and you'll never be sure you've got every possible
version of the thing you'd like to match. Take for example, matching
something as simple as a letter, number or underscore character:
if ( $test eq "a" ) { print "OK" } elsif ( $test eq "b" ) { print "OK" } ... elsif ( $test eq "9" ) { print "OK" } ... elsif ( $test eq "_" ) { print "OK" } else { print "Not a letter, number or underscore!" }
This is a waste of time that will be over 63 eye-bending lines long, and still won't match the correct spelling of 'naïve', let alone хуёво in the original Cyrillic. So, from time immemorial, there have been things called 'regular expressions' or 'regexes', which are a way of explaining to a programming language the things you want to match in a neat and tidy fashion. Unfortunately, regex are rather complicated, vary from language to language, and are really a language all of their own (a form of logic programming). Despite looking like executable line-noise, they are incredibly useful and powerful. So let's get down to them.
In Perl, regex are written in quotes, of a sort. Here is such a regex:
/\w/
The / /
are the 'quotes' for the regex: the regex itself
is just the \w
bit. This regex does exactly what those 63
lines of code would do badly: they match a single letter, number or
underscore. As you know, the \
is an escaping character:
anything after it has some special meaning to perl. Unsurprisingly, the
w
stands for 'word', and \w
will match a single
occurrence of any 'word' character, which perl happily defines as a
letter, underscore or number (i.e. things valid in the names of
perl variables and subroutines). The posh name for this is a character class, which we'll cover later, for the moment,
suffice it to say \w
is the same as
[A-Za-z0-9_]
(only it can also cope with non-ASCII letters
in modern, Unicode-aware, versions of perl). So the program we really
want to write is:
#!/usr/bin/perl use strict; use warnings; chomp( my $test = <STDIN> ); if ( $test =~ /\w/ ) { print "OK" } else { print "Not a letter, number or underscore!" }
The =~
is the 'binding operator'. It makes perl
do the regex on the right to the variable on the left.
So:
$test =~ /\w/;
and
$_ =~ /\w/;
will test $test
and $_
for their wordiness
respectively. In fact, (as usual), there's a shorthand for the second
one: $_
is the default variable, and if perl finds a naked
regex, it'll assume you mean $_ =~ naked_regex
:
$_ =~ /\w/;
and
/\w/;
are exactly the same thing. If a regex matches, it returns TRUE, so:
print "Match" if /\w/;
will print
"Match" only if $_
contains a
word character.
Another useful way to write this is with a logical operator:
/\w/ && print "Match";
Which does the same thing: the &&
is a
short-circuit operator, so if the first thing is FALSE (i.e.
$_
is not wordy), it doesn't bother evaluating the second
(i.e. print "Match"
). If you want the match to fail
(return FALSE) if it matches a word character, you can use
!~
:
$test !~ /\w/;
or simply negate a naked regex with the not !
operator:
! /\w/;
To make our original program even tinier, we can use this default
shorthand, and a new operator, the ? :
operator:
#!/usr/bin/perl use strict; use warnings; chomp( $_ = <STDIN>); print /\w/ ? "OK" : "Not a letter, number or underscore!";
The ? :
operator is like a tiny 'if
else' statement:
print ( if $_ matches /\w/ ? then return "OK" : else return "Not a letter, number or underscore!" );
A ? B : C
will test A to see if it is TRUE. If it is
TRUE, it returns B, if it is false, it returns C. print
then
gets handed whatever this statement returns, i.e. "OK", or "Not
a letter…".
Now, what if we want to match more than one word character?
/\w+/;
will do just that: a +
means 'one or more of the
preceeding character'. So this pattern will match a
,
bbbbbb
, d_99
and so on. However, it will
also match 999;;;plop
, because 999
matches /\w+/
(perl never bothers going as far as the
'plop', as it's already satisfied the match with the 999 - in fact, just
with 99). If we want to make sure that we match a thing made
entirely out of word characters, we can use:
/^\w+$/;
The ^
means 'beginning' and $
means 'end',
(beginning and end of the string you =~
bind to the regex).
So this regex will only match strings composed purely of word
characters.
Another useful escape sequence is \s
, which matches a
space character (including both literal spaces, and \n
newlines, \r
carriage returns, \t
tabs and a
few other obscure things). To match a space only, you can just use:
/ /;
and to match a newline:
/\n/;
\d
will similarly match a single digit
[0-9]
.
An extremely important thing you can do with a regex is to capture
what perl actually matched. To do this, you use ( )
parentheses within the regex:
/^(\w+)$/;
If the regex matches $_
, which it will if $_
is composed entirely of 'word' characters, then the thing that
\w+
matched will now be squirrelled away by perl for your
perusal. How do we get at these stored goodies? Well, there are two ways.
The first is to use the pattern match variables, $1
,
$2
, $3
, $4
… Whatever was
captured by the first set of parentheses will appear in $1
,
the second set in $2
, and so on. So:
/(\w(\s+)(\w+))/;
If this actually matches $_
, then the entire match
\w\s+\w+
will be found in $1
, the space
characters \s+
will be found in $2
, and the
last word characters \w+
will be found in $3
.
Unfortunately, there's currently no simple way to build hashes, or any
nested structure, from regex-captures, although Perl 6 will have this ability. Another way to do this
is to assign the results of the regex to a list outside the regex:
my ( $wholething, $space, $word ) = $test =~ /(\w+(\s+)(\w+))/;
Here, if the regex matches, the values of $1
,
$2
and $3
will be dumped into
$wholething
, $space
and $word
respectively. You may have just noticed that a regex is a context
sensitive thingy: in list context it returns the match variables, in
scalar context, it returns TRUE or FALSE.
By the way, if the regex:
/(\w(\s+)(\w+))/;
makes you eyes hurt, you can use the /x
extended modifier, thus:
/ ( this in $1 \w # a word character (\s+) # some spaces, capture into $2 (\w+) # some more word characters, capture into $3 ) /x;
perl ignores any whitespace in a /x
modified regex.
Another very useful modifier is /i
, which makes a regex case
insensitive:
/^hello, world$/i;
will match "Hello, World", "hello, world" and indeed "HEllO, WoRLd".
Note that in regex, unescaped letters and numbers mean just what you
type: it's only escaped alphanumeric characters (\w
word
character, \d
digit) and punctuation (+
one or
more, ^
start of string) that mean something special.
Regexes are 'greedy' and 'lazy' by nature. If you have this situation:
#!/usr/bin/perl use strict; use warnings; $_ = "hello everybody"; /(\w+)/; print $1;
hello
$1
will end up with "hello" in it. This shows that
regexes are lazy (they match at the first place in the string they can,
so "hello", not "everybody"), and that they are greedy (the regex has
matched the maximum possible number of letters, "hello", not just "h" or
"hell"). The modifier +
always tries to greedily slurp up as
many characters as it can and still match the whole sequence. The same
applies to *
, which is zero or more of the preceeding
character:
/^\w*$/;
will match any alpha_num3ric string, and also the empty string "".
Another quantifier is the ?
, which indicates you want to
match zero or one of the preceeding character:
/Steven?/;
Will match Steve or Steven.
The second most pointless regex in the world is this:
/.*/;
The .
is a special metacharacter that means 'any
character except \n
', so this regex will match pretty much
anything as long as it's not entirely a string of newlines. The
most pointless regex of all is:
/.*/s;
The /s
modifier makes .
match
\n
too (it treats a multiline string with embedded
\n
as a single line). So this regex matches
zero or more of anything, so it will always match regardless of what
$_
is!
You can specify exactly how many of a character you want
using {n,m}
braces:
/\w{3}/; # matches exactly 3 alpha_num3rics /\w{3,8}/; # matches 3 to 8 alpha_num3rics /\w{3,}/; # matches 3 or more alpha_num3rics /\w{1,}/; # pedant's version of /\w+/; /\w{0,}/; # pedant's version of /\w*/; /\w{0,1}/; # pedant's version of /\w?/;
Sometimes, greedy regexes are not what you are after. You can stop
regexes being greedy using the ?
modifier on any of the
quantifying metacharacters, i.e. * ? {n,m}
and
+
. So:
#!/usr/bin/perl use strict; use warnings; $_ = "hello everybody"; /(\w+?)/; print $1;
h
This code returns the smallest possible match, rather than the greediest.
Now, as I said earlier,
\w
is (as far as ASCII is concerned) equivalent to the
'character class':
[A-Za-z0-9_]
which is fairly self explanatory: brackets are used to surround a list of characters that comprise the class. Here are some useful(?) classes:
[aeiouAEIOU] # English vowels [10] # binary digits [OIWAHMVX] # bilaterally symmetrical capital letters
Any quantifier appearing after a character class applies to the whole character class: one or more of any of the characters in the braces:
/[A-Z]+/
Matches one or more capital letters. You can define your own character classes using this notation, but please have a care for those who live outside the comfy world of 7 bits:
$_="El niño"; /(\x{00F1})/ and print "Yep, matched an n-tilde: $1";
The \x{00F1}
(which can be abbreviated to
\xF1
if this isn't ambiguous) is the Unicode code point of
the ñ character. You can also use named characters with the 'charnames'
pragma...
use charnames ':full'; $_="á é í ü or even ñ"; /(\N{LATIN SMALL LETTER N WITH TILDE})/ and print "Yep, matched an n-tilde: $1";
For these codes and names, you might want to download Unibook. To save yourself even more
time, you can use utf8
:
use utf8; my word = "λόγος"; print "It's all Greek to me\n" if $word =~ /^\w+$/;
This changes the sematics of \w
so that it'll match
Greek, Arabic, hiragana, hangul, and maybe one day even Egyptian
hieroglyphs and tengwar. If this pragma is loaded, it will also allow you
to create subroutines with non-ASCII names:
use utf8; λόγος(); sub λόγος { print "You'll be lucky if 'λόγος' prints correctly in your terminal!\n"; }
Most of the punctuation metacharacters (the characters like
+
and .
and *
that mean something
special in a regex) lose their meta-nature inside a character class.
Usually, you have to escape these metacharacters in a regex:
/\*/; / \+ \? /x;
The first will match a literal * character, the second a literal string of +?. But inside a character class, you don't need to bother:
/[*+.]+/;
will match one or more asterisks, periods or plusses: there's no need
to escape them, because only a few characters mean something special
inside a character class. The characters that do mean something
special inside a character class include -
, which
makes a natural range, as you saw in the definition of \w
(hence [A-Z]
, [a-f]
, [1-6]
,
[0-9A-Fa-f]
, etc.), and ^
, which means
'anything except…' iff it's the first item in the
brackets. So:
/[^U]/; # anything but the capital letter U /[^A-Z0-9]/; # anything but capital letters and numbers /[A-Z^]/; # capital letter or caret /[^A-Z^]/; # anything but a capital letter or caret /[^A-Za-z0-9_]/; # anything but a word character.
Now, that last one could be written more easily as
/[^\w]/
or even better as /\W/
, the
\W
being Perl's shorthand for 'anything but an
alpha_numeric'. Likewise \S
is anything but whitespace, and
\D
is anything but a digit. If you do want to
include a special character like -
or ^
in a
character class, you'll need to escape it:
/[ \\ \/ \- \] ]/x; # note the x so I can pad them nicely with spaces
This will match a single backslash \
(which you
always need to escape in Perl, whether in plain code, regex or
in a character class). It will also match a forward slash /
,
a ]
close bracket (this needs escaping, else perl will think
it's the end of the character class prematurely) or a hyphen
-
. You may be wondering about why you have to escape the
/
. This is for similar reasons to escaping quotes in strings. If you don't escape
the regex delimiter /
, perl will think the regex finishes in
the wrong place. Fortunately for matching path names under Unix, like
qq()
and q()
, you can specify your own regex
quotes with m()
(for match):
m(\w+?); m{[\\ / \- \] ]}x;
See that with the second, you no longer need to escape the
/
. This is very useful in situations where otherwise you'd
be writing:
/C:\/perl\/bin\/perl\.exe/;
which is called leaning toothpick syndrome:
m{C:/perl/bin/perl\.exe};
is rather better. As with quoting strings, avoid clever and cute delimiters: stick to slashes, parentheses or braces unless you want the maintainer of your code to come calling with a machete.
What else can you do with regexes? Well, you can specify alternatives:
/foo|bar/;
which will match both foo and bar, using the |
or
pipe-character. One problem with this is sometimes you'll need to group
things using parentheses:
/([Cc]ornelia|my snake) eats (\w+)/;
but now the interesting thing you're trying to capture (what
[Cc]ornelia
eats) is in $2
, not
$1
, which may be OK, but if you'd rather not have spurious
pattern match variables to ignore, you can use the
grouping-but-not-capturing (?: )
regex extension:
( $food ) = /(?:[Cc]ornelia|my snake) eats (\w+)/;
The (?: )
allows grouping, but doesn't squirrel away a
value into $1
or its friends, so it doesn't interfere with
assigning captures to lists. There are dozens of other regex extensions
looking like (?...)
in Perl regexes, which you can explore
yourself (they also make Perl's regular expression highly
irregular to computer scientists).
Perl has three special regex punctuation variables. $`
$&
and $'
. These are the pre, actual, and post
match variables:
#!/usr/bin/perl use strict; use warnings; my $string = "Cornelia eats mice that I've thawed on the radiator"; $string =~ /mice|mouse/; print "PRE $`\nMATCH $&\nPOST $'\n";
PRE Cornelia eats MATCH mice POST that I've thawed on the radiator
Using these three variables will slow down your Perl program, and are almost unreadable, but use them if you must.
One last thing to do is to use what you've already matched, i.e. backreference within a regex. Say you want to find the first bold or italic word in an HTML document:
#!/usr/bin/perl use strict; use warnings; my $html_input_file = shift @ARGV; local $/ = undef; # this sets the local 'input separator' to nothing, so that open my $HTML, $html_input_file or die "Bugger: can't open $html_input_file for reading: $!"; $_ = <$HTML>; # this will slurp in an entire file, rather than a line at a time m{ <(i|b)> # an <i> or <b> tag, captured into $1 (.*?) # minimum number of any characters captured into $2 </\1> # an </i> or </b>, depending on the opening tag }sxi; # . matches \n, extended, case insensitive print "$2\n";
The \1
allows the pattern to match the same
something that would end up in $1
, here 'b' or 'i'. This
isn't written $1
like you'd expect (there is a good but
technical reason). This regex (or some variation on it) looks like it
will parse HTML. However, it is actually impossible to parse nested
languages like HTML or XML without a more complex sort of grammar than can
be provided by regexes. Getting around this problem can wait until
a (much) later lesson on parsing.
Regexes can be used both directly, and stored
for later use using the qr()
operator. This q(uote) r(egex)
operator is a simple way of keeping regexes and passing them around like
strings:
#!/usr/bin/perl use strict; use warnings; my $regex = qr/(?:milli|centi)pedes?/i; my $text = "Millipedes are cute. No really."; print "Found something interesting\n" if $text =~ /$regex/;
You can use $regex
wherever you'd usually use a regex (in
a match, or a substitution), and you can pass it to subroutines, or use
it as part of a larger regex. Note that any modifiers, like
/i
, are internally incorporated into the string and
honoured. You can even print out the $regex
as a string. How
useful.
Summary
- Atoms of regexes: alpha_numeric characters, character class escapes
(
\w
word,\W
not-word,\s
space,\S
not-space,\d
digit,\D
not-digit), character classes[blah1-9] and negated classes [^blah1-9]
, escaped metacharacters (\.
a literal . period), metacharacters (.
anything but\n
). - Alternatives : use the
|
for alternatives. - Quantifiers for the atoms:
*
(0 or more),+
(1 or more),?
(0 or 1),{n,m}
(between n and m). - Greediness : can be turned off with a
?
following the+ ? * {n,m}
quantifiers. - Capturing : use
()
parentheses, and grab$1
,$2
, etc. Use(?: )
to avoid captures if you just want to use the parentheses to group, not capture. - Backreferences : use
\1
,\2
, inside the match instead of$1
,$2
, etc. - Modifiers:
/x
ignores whitespace and comments,/s
makes . match\n
, and/i
make the regex case-insensitive. These are usually called the/X
modifiers, even though the / is actually part of the regex quoting mechanism. There is also a/m
modifier that changes the semantics of the start and end of string markers (^ $ \A \Z \z
).perldoc perlre
for details.
Substituting, splitting, grepping and mapping
Matching patterns is very
useful, but often we want to do something more than just match things.
What if you want to replace every occurrence of a certain thing with
something else? This is the domain of the s///
and
tr///
operators. s///
is the substitution
operator, and tr///
is the transliteration operator.
tr///
is useful for simple things:
#!/usr/bin/perl use strict; use warnings; my $string = "all lowercase with 5ome num8er5"; $string =~ tr/a-z/A-Z/; print $string;
ALL LOWERCASE WITH 5OME NUM8ER5
You just make a list on one side of the tr///
, and a list
on the other side (hyphens can be used to create natural ranges), and
perl will map one lot to the other. The substitution operator is even
more powerful and useful:
#!/usr/bin/perl use strict; use warnings; $_ = "old M\$ dross"; s/old/new/i; # substitute any occurrence of old with new, case insensitively s/M\$/Microsoft/i; s/dross/loveliness/i; print; # did you forget print defaults to $_ ?
new Microsoft loveliness
In the second one, note you have to escape the $
. This is
because both pattern matching and substitution can interpolate
variables:
#!/usr/bin/perl use strict; use warnings; my $name = "Cornelia"; my $string = "Cornelia is a corn snake."; print "Matched $name\n" if $string =~ /$name/; $string =~ s{$name}{My snake}; print $string;
Matched Cornelia My snake is a corn snake.
Note that like m//
, s///
and
tr///
can use the usual 'any quotes you fancy', although
avoid ?
and '
, as they have a special
significance. So:
s|A|B|; # three the same s(A){B}; # two pairs s{A}|B|; # one pair, two the same
all work, although I'd only recommend the middle one. The
s///
can take all the modifiers (/s
,
/x
, /i
) that m//
can take, but it
has another two of its own, /g
and /e
.
/e
is like a little eval
(we will discuss eval
later) that
evaluates the substitution's right hand side, and /g
means
'globally', i.e. do it to every match you find:
#!/usr/bin/perl use strict; use warnings; my $string = "2 3 4 5 6"; $string =~ s/ (\d+) / 2 * $1 /xge; # double every number you match print $string;
4 6 8 10 12
Clever eh? If you hadn't noticed, when you use a substitution with
capture parentheses, the captures are in $1
, etc.,
as usual, and you can use these on the right hand side of the
s///
. Of course, you can also use /g
and
/e
separately. In fact, you can use /g
on
m//
as well:
$_ = "2 3 4 5 6"; while ( /(\d+)/g ) { print "$1 times 2 is ", $1 * 2, "\n"; }
2 times 2 is 4 3 times 2 is 6 4 times 2 is 8 5 times 2 is 10 6 times 2 is 12
Here, the /g
means 'keep matching till you run out of
string'.
There are several operators that use pattern matching of one sort or
another. The first is split
. split
expects a
list. The first argument is the regex you want to split
the
string on, the rest of the arguments are things to split
.
You can capture the split
bits in an array:
#!/usr/bin/perl use strict; use warnings; my $string = "A : colon:delimited: file: with: some : random :spaces"; my ( @bits ) = split /\s*:\s*/, $string; # splits on colons surrounded by optional spaces print "$_\n" foreach @bits;
A colon delimited file with some random spaces
The opposite of split
is join
, which has a
similar syntax, only it expects not a regex as its first argument, but a
string. So:
#!/usr/bin/perl use strict; use warnings; my $joined = join "|", qw/one two three four five six/; print $joined;
one|two|three|four|five|six
How about this:
#!/usr/bin/perl use strict; use warnings; print join "|", reverse split /\s*:\s*/, "A: colon: delimited : file: with : spaces";
spaces|with|file|delimited|colon|A
Running list operators into each other like this a) is clever, but b) easily becomes unreadable. Caveat scriptor.
Another useful tool for regex is grep
. This operator
takes a regex as its first argument too, and a list of things to
'grep
' as the rest. What is grep
ping? Well,
grep
ping means 'returning the things that match from a
list':
#!/usr/bin/perl use strict; use warnings; my ( @names ) = qw/ Cornelia Atropos Lachetis Amber /; my ( @match ) = grep /^A/, @names; my ( @not_match ) = grep ! /^A/, @names; print "Start with A @match\nDon't @not_match\n";
Start with A Atropos Amber Don't Cornelia Lachetis
See that you can make an anti-grep
using the
!
'not' before a regex. The way grep
actually
works is by running through the list you give it, setting $_
to each item in turn. It then uses the regex to pattern match on
$_
, as usual. Only things that match are returned.
grep
is useful for finding lines in a file that match a
certain pattern. It's another of those Perl operators that returns
different values in scalar and list context. In list context (previous
example) it return the list of matches, but in scalar context:
my $number = grep /^A/, @names;
it returns the number of matches. grep
can be heavily
abused, syntactically speaking:
grep /regex/, LIST; grep { /regex/ } ( LIST );
Both work the same, although I always use the latter, as it makes the
condition more obvious. This may vaguely
remind you of sort
. I prefer the second version, even
though it's line noise for its own sake.
One final operator before we leave regexes. map
has
nothing to do with regexes, but it has a similar syntax to
grep
(and to sort
for that matter). I love
map
. There's nothing like it for bringing out the
mathematician in you. map
needs a block of code that does
something to $_
, followed by a list, just like
grep
. map
then runs though the list, using
$_
to cache each value, so you can torture it with the block
of code:
@mapped = map { DO_SOMETHING_TO $_ } ( LIST );
So:
#!/usr/bin/perl use strict; use warnings; @doubled = map { 2 * $_ } ( qw/ 2 4 6 8 10 / ); print "@doubled";
4 8 12 16 20
This is shorthand for:
#!/usr/bin/perl use strict; use warnings; @doubled = map { return 2 * $_ } ( qw/ 2 4 6 8 10 / ); print "@doubled";
in case you were wondering: blocks return the last thing they
evaluated in the absence of an explicit return
statement.
Dull? Yes. But how about:
#!/usr/bin/perl use strict; use warnings; @selective_doubles = map { /[24680]$/ ? ( 2 * $_ ) : $_ } ( qw/ 1 2 3 4 5 6 7 8 / ); print "@selective_doubles";
1 4 3 8 5 12 7 16
which returns a list of numbers that have been doubled iff (if and only if) they are even.
One word of warning for both grep
and map
.
$_
is not a copy of the data in the list you feed to these
functions, it's an alias to the actual values of the list. That
means that if you modify $_
itself, rather than just
returning it, you will alter the items in the list fed to
grep
or map
, not just the items in the returned
list. This may be what you want, but probably isn't:
#!/usr/bin/perl use strict; use warnings; my @original = qw/Abacus chocolate sprite/; print "original: @original\n"; my @returns = map { s/A//gi; } ( @original ); print "afterward: @original\nreturned: @returns\n";
original: Abacus chocolate sprite afterward: bcus chocolte sprite returned: 2 1
You may be wondering what the hell has happened. Well, firstly, the
actual members of @original
have been altered, because
s///
messes with $_
directly. Hence all the A
characters have been stripped. The s///
operator returns the
number of substitutions in scalar context, hence
@returns
contains 2
(Abacus), 1
(chocolate) and undef
(since sprite
contains no /A/i
). If you remember that a map
is basically a foreach
loop:
my @mapped = map { DO_SOMETHING_TO $_ } ( LIST );
and
my @mapped; foreach ( LIST ) { my $return_value = DO_SOMETHING_TO $_; push @mapped, $return_value; }
are the same thing, you'll be fine. As long as you remember that altering the value of
$_
in a foreach
loop indirectly alters the
original value in the LIST
, that is! Go on, try writing the
s///
map
as a foreach
loop, and
you'll see what I mean.
#!/usr/bin/perl use strict; use warnings; my @original = qw/Abacus chocolate sprite/; print "original: @original\n"; my @returns; foreach ( @original )
{ my $return_value = s/A//gi; push @returns, $return_value; } print "afterward: @original\nreturned: @returns\n";
Told you so. What you probably need in this case is a temporary variable:
#!/usr/bin/perl use strict; use warnings; my @original = qw/Abacus chocolate sprite/; print "original: @original\n"; my @returns = map { my $tmp = $_; $tmp =~ s/A//gi; $tmp; } ( @original ); print "afterward: @original\nreturned: @returns\n";
original: Abacus chocolate sprite afterward: Abacus chocolate sprite returned: bcus chocolte sprite
Summary
The s///
operator acts like the m//
operator, but selectively substitutes text. The tr///
operator is quicker and easier for simple substitutions. The syntax of
the new list operators is:
@splat = split /\s/, @splitees; @junt = join '+', @joinees; @mup = map { $_ * 2 } @mappees; @grap = grep { /\d+/ } @grepees; @argh = map { "IP: $_" } join '.', split /\:/, grep { /^\d{1,3}:\d{1,3}:\d{1,3}:\d{1,3}$/ } ( @ip );
Test yourself
See if you can write a script that does the following:
- Write a dirty hack that extracts the keywords from the head of the
HTML document you are reading, and pretty prints them with uppercase
first letters (use
ucfirst
). Count the number of times the word regex appears and output this.
#!/usr/bin/perl use strict; use warnings; local $/ = undef; # slurp mode open my $FILE, "<", "lesson06.html" or die "Can't open file for reading: $!\n"; $_ = <$FILE>; # so we can default match on $_ my ( $keywords ) = /<meta \s+ name \s* = \s* "keywords" \s+ content \s* = \s* "([^"]+?)" /sx; my @keywords = split /\s*,\s*/, $keywords; print map { ucfirst "$_\n" } @keywords; print "I counted regex ", scalar( grep {/regex/i} @keywords ), " times\n";