Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
ModSecurity 2.5

You're reading from  ModSecurity 2.5

Product type Book
Published in Nov 2009
Publisher Packt
ISBN-13 9781847194749
Pages 280 pages
Edition 1st Edition
Languages

Table of Contents (17) Chapters

ModSecurity 2.5
Credits
About the Author
About the Reviewers
1. Preface
1. Installation and Configuration 2. Writing Rules 3. Performance 4. Audit Logging 5. Virtual Patching 6. Blocking Common Attacks 7. Chroot Jails 8. REMO 9. Protecting a Web Application Directives and Variables Regular Expressions Index

Example of a regular expression


To get a feeling for how regular expressions are used, let's start with a real-life example so that you can see how a regex works when put to use on a common task.

Identifying an email address

Suppose you wanted to extract all email addresses from an HTML document. You'd need a regular expression that would match email addresses but not all the other text in the document. An email address consists of a username, an @ character, and a domain name. The domain name in turn consists of a company or organization name, a dot, and a top-level domain name such as com, edu, or de.

Knowing this, here are the parts that we need to put together to create a regular expression:

  • User name

    This consists of alphanumeric characters (0-9, a-z, A-Z) as well as dots, plus signs, dashes, and underscores. Other characters are allowed by the RFC specification for email addresses, but these are very rarely used so I have not included them here.

  • @ character

    One of the mandatory characters in an email address, the @ character must be present, so it is a good way to help distinguish an email address from other text.

  • Domain Name

    For example cnn.com. Could also contain sub-domains, so mail.cnn.com would be valid

  • Top-Level Domain

    This is the fi nal part of the email address, and is part of the domain name. The top-level domain usually indicates what country the domain is located in (though domains such as .com or .org are used in countries all around the world). The top-level domain is between two and four characters long (excluding the dot character that precedes it).

Putting all these parts together, we end up with the following regular expression:

\b[-\w.+]+@[\w.]+\.[a-zA-Z]{2,4}\b

The [-\w.+]+ part corresponds to the username, the @ character is matched literally against the @ in the email address, and the domain name part corresponds to [\w.]+\.\w{2,4}. Unless you are already familiar with regular expressions, none of this will make sense to you right now, but by the time you've finished reading this appendix, you will know exactly what this regular expression does.

Let's get started learning about regular expressions, and at the end of the chapter we'll come back to this example to see exactly how it works.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}