You're reading from Perl 6 Deep Dive

Product typeBook

Published inSep 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781787282049

Edition1st Edition

Languages

Perl

Tools

Rakudo Perl

Concepts

Programming Language

Author (1)

Andrew Shitov

Regexes

Regular expressions are one of the most valuable features of Perl. In Perl 6, regular expressions were redesigned to make them more regular and powerful. The term also changed—regular expressions are more often called simply regexes now. In this chapter, we will go through all the elements of the syntax of regexes.

The following topics will be covered in this chapter:

Matching against regexes
Literals
Character classes
Quantifiers
Anchors
Alternation
Grouping
Capturing and named captures
Named regexes
The Match object
Assertions
Adverbs
Substitution

Matching against regexes

Regexes describe patterns of text. They provide us with a language, in which we can express the structure of the text.

Consider an example. A phone number is a sequence of digits. The phrase "sequence of digits" can be written down as \d+. If we take into account the fact that phone numbers may be written with spaces and dashes, then we have to say that a phone number is a sequence of digits, delimited with spaces or dashes. This is already a more complex regex, which can be written differently, depending on how strict we are, for instance, if we allow two spaces together or if a dash can be followed by a space, or if a group of digits can consist of a single digit.

Let's be least strict and formalize it as (\d || \s || \-)+, that is more than one number of digits (\d) or spaces (\s) or dashes (\-). The double vertical bar stands for "...

Literals

The syntax of regexes is a small language within Perl 6. As there are many things to express, it uses some characters to convey the meaning. Letters, digits and underscores stand for themselves without any special meaning. These characters can be used as-is, as shown in the following example:

my $name = 'John';
say 'OK' if $name ~~ /John/; # OK

my $id = 534;
say 'OK' if $id ~~ /534/; # OK

If the string inside a regex contains other characters, for example, spaces, you should take care of them. One of the possibilities is to quote the whole string:

my $name = 'Smith Jr.' ;
say 'Junior' if $last-name ~~ /' Jr'/; # Junior

The literal string ' Jr' inside a regex contains a space that will have to be present in the variable $name.

Another alternative is to use a special character, prefixed by a backslash. For...

Character classes

A character class in regexes is a special sequence that matches characters from some given set. For example, in the previous section, we already used a character class \s, which matches with an ASCII space as well as with some other whitespace characters, such as tabs. Let us explore character classes in regexes of Perl 6.

The . (dot) character

A very simple character, just a single dot, can match with any character in the string. This is often used when you do not care about some character between the two parts. For example, the following code will match with a string that has any two characters between a and d:

say 'OK' if 'abcd' ~~ / a . . d /; # OK
say 'OK' if 'aefd...

Creating repeated patterns with quantifiers

Quantifiers modify the previous atom and request the particular number of repetitions. An atom is a character or character class or a string literal or a group (we will talk about groups later in the Extracting substrings with capturing section of this chapter).

The + quantifier allows the previous atom to be repeated one or more times. For example, the regex /a+/ matches with a single character a, as well as with a string containing two characters aa, or three, or more—aaaaaa. It will not, however, match with a string that does not contain the a character at all.

The * quantifier allows any number of repetitions, including zero. So, the /a*/ regex matches with strings such as bdef, abc, or baad. Of course, a single /a*/ may not be that useful; the * quantifier's more natural use case is between other substrings, such as...

Extracting substrings with capturing

Matching against regexes is not enough. The real power of regular expressions is not complete without the ability to extract the substrings that agreed with the regex pattern. Saving the parts of the string in special variables is called capturing.

Capturing groups

In Perl 6, capturing is achieved by placing the part of a regex in parentheses. Parentheses have as dual meaning in regexes. We already have seen the usage of parentheses for grouping alternatives in the phone number.

Let us continue with the example of extracting values of HTML attributes. We want now to print the values. So, we need to create a regex and mark the borders of the data that we want to extract. Captured data is...

Using alternations in regexes

Let us look once again to our naïve regex for matching phone numbers:

rx/ \+? (\d || \s || \-)+ /

Vertical bars separate different variants within the group in parentheses. It can be either \d, or \s, or \-. In the context of regexes, this is call alternation. Different variants are, correspondingly, called alternatives.

In Perl 6, there are two forms of alternation separator in regexes—single | and double || vertical bars . With a single vertical bar, the longest variant always wins. With the double bar, the first matched alternative wins.

In the phone number example, each alternative is exactly one symbol long. So, there is no difference between | and || there. In other cases, the choice of the operator may drastically change the result.

For example, take the two regexes from the following example and match the forms of an adjective...

Positioning regexes with anchors

In many cases, a regex has to be applied to the string in such a way that its beginning coincides with the beginning of the string. For example, if a phone number contains the + character, it can only appear in the first position.

Perl 6 regexes have so-called anchors—special characters, that anchor a regex to either the beginning or the end of the string or a logical line.

Matching at the start and at the end of lines or strings

Let us modify the phone number regex so that it forces the regex to match with the whole string containing a potential phone number:

/ ^ \+? <[\d\s\-]>+ $ /;

Here, ^ is the anchor that matches at the beginning of the string and does not consume any...

Looking forward and backward with assertions

Another topic of manipulating the flow of a regex is assertions. During the match process, the pattern consumes characters of the source strings. Assertions help to make some checks at the current position without eating characters.

There are two types of assertions in Perl 6 regexes—lookahead and lookbehind. Each of them can be negated. In the following table, all the possible combinations are listed:

	Positive assertion	Negative assertion
Lookahead	`<?before X>`	`<!before X>`
Lookbehind	`<?after X>`	`<!after X>`

Being placed inside a regex, the lookahead assertion <?before X> checks whether at this position the following characters are X. If it is so, then the assertion succeeds and the regex engine continues its work. Other assertions behave following the same logical considerations, for example...

Modifying regexes with adverbs

Adverbs are regex modifiers. They are colon-prefix letters that change the behavior of regexes.

Adverbs exist in two forms—short and long—and appear in front of a regex, for example:

say 'OK' if 'ABCD' ~~ m:i/ abcd /;

Notice, that when an adverb is applied to the whole regex as in this example, m or rx is needed. Alternatively, an adverb can be put inside the regex. In this case, it starts its action from the position where it appeared. This is demonstrated in the examples in the next section about the :i adverb.

The following table lists all the adverbs:

Short form	Long form	Description
`:i`	`:ignorecase`	Match letters are case-insensitive
`:s`	`:sigspace`	Whitespacess are significant
`:p(N)`	`:pos(N)`	Start at position N
`:g`	`:global`	Match globally
`:c`	`:continue`	Continue after the previous match
`:r`	`:ratchet`	Disable...

Substitution and altering strings with regexes

Matching strings with a regex often extracts some information from the given data. Another common task is to replace parts of the text with different characters. In Perl 6, the s built-in function does that.

It takes two arguments, a regex and a replacement. When a regex is applied to the source string and the pattern is matched, the part of the string that matches is replaced with the second argument.

Consider a simple example:

my $str = 'Its length is 10 mm';
$str ~~ s/<<mm>>/millimeters/;
say $str; # Its length is 10 millimeters

The regex here, /<<mm>>/, matches with the word mm. The second part tells to replace it with the full name of the measurement unit. The replacement happens in-place and the original string is modified.

Traditionally, s uses slashes as delimiters but different characters can...

Summary

In this chapter, we discussed about regexes in Perl 6. They share many common ideas with regular expressions in Perl 5 but also offer many fascinating new things. We examined the methods of constructing regexes and matching with text, learned how to extend the power of a regex engine by using character classes, written by you or built-in. We also looked at the way Perl 6 stores results in the Match object and how to make substitution and replacement in strings using regexes.

In the next chapter, we will meet an even more powerful tool that tremendously extends regexes, grammars.

The rest of the chapter is locked

You have been reading a chapter from

Perl 6 Deep Dive

Published in: Sep 2017Publisher: PacktISBN-13: 9781787282049

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Andrew Shitov

Andrew Shitov has been a Perl enthusiast since the end of the 1990s, and is the organizer of over 30 Perl conferences in eight countries. He worked as a developer and CTO in leading web-development companies, such as Art. Lebedev Studio, Booking dotCom, and eBay, and he learned from the "Fathers of the Russian Internet", Artemy Lebedev and Anton Nossik. Andrew has been following the Perl 6 development since its beginning in 2000. He ran a blog dedicated to the language, published a series of articles in the Pragmatic Perl magazine, and gives talks about Perl 6 at various Perl events. In 2017, he published the Perl 6 at a Glance book by DeepText, which was the first book on Perl 6 published after the first stable release of the language specification.
Read more about Andrew Shitov

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5