Using regular expressions
In this recipe, we will use regular expressions to find email addresses and URLs in text. Regular expressions are special character sequences that define search patterns and can be created and used via the Python re package. We will use a job descriptions dataset and write two regular expressions, one for emails and one for URLs.
Getting ready
Download the job descriptions dataset here: https://www.kaggle.com/andrewmvd/data-scientist-jobs. It is also available in the book’s GitHub repository at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/data/DataScientist.csv. Save it into the /data folder.
The notebook is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter05/5.1_regex.ipynb.
How to do it…
We will read the data from the CSV file into a pandas DataFrame and will use the Python re package to create regular...