Whitespace characters

There are only four commonly used whitespace characters in ASCII:

  • space — ' ', chr(32), "\N{SPACE}"
  • tab — "\t", chr(9), "\N{CHARACTER TABULATION}"
  • new line — "\n", chr(10), "\N{LINE FEED}"
  • carriage return — "\r", chr(13), "\N{CARRIAGE RETURN}" (used in Windows, in linux and macOS it is almost never used)

There are also two standard whitespace characters that are used very rarely:

  • "\x0B", chr(11), "\N{LINE TABULATION}"
  • "\f", chr(12), "\N{FORM FEED}"

Here is a script to get this list of whitespace characters:

▶ Run
#!/usr/bin/perl

use utf8;
use open qw(:std :utf8);
use strict;
use warnings;

foreach my $i (0..65535) {
    print "chr($i)\n" if chr($i) =~ /\s/a;
}

But in Unicode there are much more characters that are whitespace. Here is a program that prints the all Unicode characters that are whitespace from the Perl's point of view:

▶ Run
#!/usr/bin/perl

use utf8;
use open qw(:std :utf8);
use strict;
use warnings;

my $count = 0;

foreach my $i (0..65535) {
    if (chr($i) =~ /\s/) {
        print "chr($i)\n";
        $count++;
    }
}

print "\n";
print "count: $count\n";Но

The output of the program:

chr(9)
chr(10)
chr(11)
chr(12)
chr(13)
chr(32)
chr(5760)
chr(6158)
chr(8192)
chr(8193)
chr(8194)
chr(8195)
chr(8196)
chr(8197)
chr(8198)
chr(8199)
chr(8200)
chr(8201)
chr(8202)
chr(8232)
chr(8233)
chr(8239)
chr(8287)
chr(12288)

count: 24

Related topics

Other articles