/a Perl regular expressions modifier

You can use modifier /a in Perl regular expressions.

Adding /a changes the behavior of \d, \w and \s:

  • If the modifier /a is absent, these sets include many characters from Unicode.
  • If the modifier /a is prenet, these character sets include only characters from the ASCII range

Usage of /a also changes \D, \W and \S behavior.

\d

When you specify /a modifier /a character set \d includes only 10 characters, the numbers from 0 to 9. If the modifier /a is not specified, \d means all characters that are digits in Unicode.

Here is an example code:

▶ Run
#!/usr/bin/perl

use utf8;
use open qw(:std :utf8);
use strict;
use warnings;

my $str = '٢4௪၂၃';

if ($str =~ /(\d+)/) {
    print $1;
}

This program will display what was captured with \d+. It is a text٢4௪၂၃. In this code, the modifier /a is not specified, and therefore \d captures everything that is considered a digit in Unicode. There are a lot of digits in Unicode, that are used in different languages.

If we add modifier /a: if ($str =~ /(\d+)/a) {, then the program will output just a single character, the number 4, as in this case \d means just 10 characters.

You can find out the example of characters that Perl considers Unicode digits with this code:

▶ Run
#!/usr/bin/perl

use utf8;
use open qw(:std :utf8);
use strict;
use warnings;

foreach my $i (0..65535) {
    print chr($i) if chr($i) =~ /\d/;
}

\s

With modified /a character set \s includes 5 characters with Perl before 5.18 and it includes 6 characters starting with Perl 5.18:

  • "\t", chr(9), "\N{CHARACTER TABULATION}"
  • "\n", chr(10), "\N{LINE FEED}"
  • "\x0B", chr(11), "\N{LINE TABULATION}" — starting with Perl 5.18
  • "\f", chr(12), "\N{FORM FEED}"
  • "\r", chr(13), "\N{CARRIAGE RETURN}"
  • ' ', chr(32), "\N{SPACE}"

If the modifier /a is not specified, the \s includes more characters (those which are whitespace characters in Unicode).

\w

When using /a character set \w include 63 symbol and it works the same as [A-Za-z0-9_].

If /a is not used, character set \w includes more than 50 thousands of different Unicode characters.

The Perl Version

Modifier /a first appeared in Perl 5.14. If you try to use it prior to 5.14, there will be an error and code execution will be stopped. For example, the code 'abc' =~ /\w/a; in Perl 5.10 will produce an error:

Bareword found where operator expected at script.pl line 3, near "/\w/a"
    (Missing operator before a?)
syntax error at script.pl line 3, near "/\w/a"
Execution of script.pl aborted due to compilation errors.

Other articles

Comments