You're reading from Getting Started with Haskell Data Analysis

Product typeBook

Published inOct 2018

Reading LevelBeginner

PublisherPackt

ISBN-139781789802863

Edition1st Edition

Languages

Haskell

Concepts

Data Analysis

Author (1)

James Church

Regular Expressions

In this chapter, we are going to learn and understand what regular expressions are. The purpose of regular expressions is to represent a pattern that can be identified within some text data. In the context of data analysis, there are a couple of important uses for regular expressions:

To validate fields to make sure that all values within a particular column adhere to a particular format
To search fields based on a particular pattern

Word processors and editing applications have a Find and Replace feature. You submit a bit of text to identify within a larger bit of text, and the desired replacement. The application will replace all of the found text with the desired text. Many of these applications now include regular expression support. Rather than submitting an exact sequence of characters that need to be found, we submit a pattern. This pattern defines...

Dots and pipes

In this section, we're going to cover two basic bits of regular expression syntax, and those are dots and pipes. So, to begin, we are going to install the regular expression library in Haskell, and we are going to introduce the dot and the pipe syntax. Let's find the Terminal, and we need to begin by installing the library, which can be done with the following command:

So, cabal install regex-posix will install our regular expression library. Now, once installed, let's go and create a new notebook, and dive in. We are going to name this notebook as RegexLearning. We need to import the Text.Regex.Posix library, so that we can access the =~ operator, which is necessary to look at regular expressions. Let's define a couple of strings in order to get us started:

As you can see, str1 is "one fish two fish red fish blue fish", the title...

Atom and Atom modifiers

In this section, we will be expanding on our knowledge of regular expressions by discussing the atom. We will be covering the concept of an atom. An atom is a single expression such as a character or a dot, or an expression that has been defined using parentheses or - as we will see in a further section- the character class. We will also introduce atom modifiers. The idea is that you can take any atom, and then modify it using a modifier. Now, let's go back to our RegexLearning notebook and continue from where we left off in the last section.

Imagine that you have a string representing a date in the year-month-day separated by a dashes format, and you wish to verify that this date is in the 1900s or the 2000s. So, let's say that we have a date of 1969-07-20, and we wish to verify that this date is in either the 1900s or the 2000s:

Well, we crafted...

Character classes

Character classes are a way of combining characters with common traits into a single classification, such as characters that represent numbers, letters, vowels, or hexadecimal characters. Once we get into the details, we will see how useful character classes are. So, in this section, we're going to take a look at introducing the basics of character classes. We'll expound on that by introducing character class ranges, character class negations, and then we will write a full regular expression to handle matching dates.

So, our first introduction to character classes begins with vowels. Vowels are the letters A, E, I, O, U. Almost every word has a vowel in it. Let's see if we can write a character class that matches a vowel:

So, here we have word "dog" and, to begin a character class, we use square braces. Inside the square braces we have...

Regular expressions in CSV files

We need to know the importance of using regular expressions in various file formats such as CSV and SQLite3. In this section, we will be covering the CSV format. So, let's examine a question using one of our past datasets. Using our Baseball dataset, let's try to find out the average number of runs scored by away teams in the month of March. To do this, we'll need our CSV file of data, which has the dates in the first column, but is not organized by month.

So, in order to solve this, we're going to be crafting a regular expression to match a field in the CSV file. In this case, we will be using the first column of dates. We're going to be pairing that information with another column; and in this case, the other column is going to be the runs scored by away teams. Then, we're going to filter that information to get...

SQLite3 and regular expressions

Working with regular expressions in our SQLite3 database is no different than working with a CSV file. In this section, we will demonstrate how to filter our data using regular expressions, using the timestamp data from an SQLite3 database in a similar manner to our last section. So, we're going to be loading the data from the SQLite3 database, sifting through that data using a regular expression, and analyzing the data gleaned from that regular expression. Now, the problem that we will try to solve in this section is to determine how many earthquakes happen by hour in our 7-day database. Let's go and create a new Haskell notebook; we will name this notebook RegexLearning-SQLite3. Let's first import our libraries:

We won't be using any descriptive statistics in this section, so there's no need to load the descriptive statistics...

Summary

In this chapter, we began by installing the regular expression library, and we talked a little bit about the regular expression syntax, such as how the dot matches any one character and the pipe allows us to match any expression to the left or the right of the pipe. We talked about atoms and atom modifiers. We also talked about character classes at length. We used regular expressions within a CSV file and an SQLite3 database. You should always thoroughly test your regular expressions, as they tend to be difficult to debug. With that, we will be discussing data visualization in the next chapter.

The rest of the chapter is locked

You have been reading a chapter from

Getting Started with Haskell Data Analysis

Published in: Oct 2018Publisher: PacktISBN-13: 9781789802863

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

James Church

James Church lives in Clarksville, Tennessee, United States, where he enjoys teaching, programming, and playing board games with his wife, Michelle. He is an assistant professor of computer science at Austin Peay State University. He has consulted for various companies and a chemical laboratory for the purpose of performing data analysis work. James is the author of Learning Haskell Data Analysis.
Read more about James Church

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages