Reader small image

You're reading from  The Clojure Workshop

Product typeBook
Published inJan 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838825485
Edition1st Edition
Languages
Right arrow
Authors (5):
Joseph Fahey
Joseph Fahey
author image
Joseph Fahey

Joseph Fahey has been a developer for nearly two decades. He got his start in the Digital Humanities in the early 2000s. Ever since then, he has been trying to hone his skills and expand his inventory of techniques. This lead him to Common Lisp and then to Clojure when it was first introduced. As an independent developer, Joseph was able to quickly start using Clojure professionally. These days, Joseph gets to write Clojure for his day job at Empear AB.
Read more about Joseph Fahey

Thomas Haratyk
Thomas Haratyk
author image
Thomas Haratyk

Thomas Haratyk graduated from Lille University of Science and Technology and has been a professional programmer for nine years. After studying computer science and starting his career in France, he is now working as a consultant in London, helping start-ups develop their products and scale their platforms with Clojure, Ruby, and modern JavaScript.
Read more about Thomas Haratyk

Scott McCaughie
Scott McCaughie
author image
Scott McCaughie

Scott McCaughie lives near Glasgow, Scotland where he works as a senior Clojure developer for Previse, a Fintech startup aiming to solve the problem of slow payments in the B2B space. Having graduated from Heriot-Watt University, his first 6 years were spent building out Risk and PnL systems for JP Morgan. A fortuitous offer of a role learning and writing Clojure came up and he jumped at the chance. 5 years of coding later and it's the best career decision he's made. In his spare time, Scott is an avid reader, enjoys behavioral psychology and financial independence podcasts, and keeps fit by commuting by bike, running, climbing, hill walking, snowboarding. You get the picture!
Read more about Scott McCaughie

Yehonathan Sharvit
Yehonathan Sharvit
author image
Yehonathan Sharvit

Yehonathan Sharvit has been a software developer since 2001. He discovered functional programming in 2009. It has profoundly changed his view of programming and his coding style. He loves to share his discoveries and his expertise. He has been giving courses on Clojure and JavaScript since 2016. He holds a master's degree in Mathematics.
Read more about Yehonathan Sharvit

Konrad Szydlo
Konrad Szydlo
author image
Konrad Szydlo

Konrad Szydlo is a psychology and computing graduate from Bournemouth University. He has worked with Clojure for the last 8 years. Since January 2016, he has worked as a software engineer and team leader at Retailic, responsible for building a website for the biggest royalty program in Poland. Prior to this, he worked as a developer with Sky, developing e-commerce and sports applications, where he used Ruby, Java, and PHP. He is also listed in the Top 75 Datomic developers on GitHub.
Read more about Konrad Szydlo

View More author details
Right arrow

4. Mapping and Filtering

Overview

In this chapter, we will begin our exploration of how to use sequential collections in Clojure by taking a look at two of the most useful patterns: mapping and filtering. We will work with the map and filter functions and handle sequential data without using a for loop. We will also use common patterns and idioms for Clojure collections and take advantage of lazy evaluation while avoiding the traps. We will load and process sequential datasets from Comma-Separated Values (CSV) files and extract and shape data from a large dataset.

By the end of this chapter, you will be able to parse datasets and perform various types of transformations to extract and summarize data.

Introduction

Dealing with collections of data is one of the most common and powerful parts of programming. Whether they are called lists, arrays, or vectors, sequential collections are at the heart of almost every program. Every programming language provides tools for creating, accessing, and modifying collections, and, often, what you've learned in one language will apply to the others. Clojure is different, however. We are accustomed to setting a variable and then controlling some other part of the system by changing the value of that variable.

This is what happens in a for loop in most procedural languages. Say that we have an iterator, i, that we increment by calling i++. Changing the value of the iterator controls the flow of the loop. By executing i = i + 3, we can make the loop skip two iterations. The value of i is like a remote control for the loop. In case we increment the iterator by three, what happens if we are just one item away from the end of the array we are...

map and filter

The map and filter functions are a key part of a much larger group of functions for dealing with sequences. Of that group, map is certainly the one you will use the most, and filter is a close second. Their role is to modify sequences.

They accept one or more sequences as input, and return a sequence: sequence in, sequence out:

Figure 4.1: A schematic diagram of map and filter working together

In the preceding diagram, we can see map and filter working together, where filter eliminates items from the original list while map changes them.

The first question to ask when solving a problem involving collections is: "Do I want to obtain a list of values, or a single value?" If the answer is a list, then map, filter, or similar functions are what you need. If you need some other kind of value, the solution is probably a reduction of some kind, which we will discuss in the next chapter. But even then, as you break the problem down...

Using Lazy Sequences

Before we move on, it's important to take a closer look at how lazy sequences work in Clojure. When using map and filter, lazy evaluation is often an important consideration. In the examples we've looked at so far, we have used a literal vector as input: [1 2 3 4 5]. Instead of typing out each number, we could use the range function and write (range 1 6). If we type this in the REPL, we get basically the same thing:

user> (range 1 6)
(1 2 3 4 5)

So, is this just a shortcut to avoid typing out lots of integers? Well, it is, but range has another interesting characteristic: it's lazy.

Before we go further, let's revisit laziness briefly. If (range 100) is a lazy sequence, that means that it is not realized until each element in the sequence has been calculated. Say we define a lazy sequence from 0 to 100:

user> (def our-seq (range 100))

Note

The REPL causes lazy sequences to be evaluated. This can be confusing sometimes...

Importing a Dataset from a CSV File

Now that we've seen some basic patterns for manipulating data, it's time to be more ambitious! We are going to start using a dataset that we will use in many of the following chapters as we build up our Clojure knowledge: ATP World Tour tennis data, a CSV file that includes, among other things, information about professional tennis matches going back to 1871. Besides learning about new concepts and techniques, we will see that Clojure can be an interesting choice for exploring and manipulating large datasets. And, naturally, most of the datasets that are available to us are CSV files.

Note

This dataset was created and is maintained at https://packt.live/2Fq30kk, and is available under the Creative Commons 4.0 International License. The files that we'll be using here are also available at https://packt.live/37DCkZn.

In the rest of this chapter, we will import tennis match data from a CSV file and use our mapping and filtering...

Exercise 4.13: Querying the Data with filter

If we think of this CSV data as a database, then writing queries is a question of writing and combining predicates. In this exercise, we will use filter to narrow our dataset down to the exact information we want. Imagine that the journalists on your team are working on a new project dedicated to famous tennis rivalries. As a first step, they've asked you to produce a list of all the tennis matches won by Roger Federer. Let's get started:

  1. Make sure that your project is set up the same way as it was in the previous exercises.
  2. Create a function called federer-wins that provides the CSV processing steps we've already used. Add the calls to select-keys and doall, which will be applied to the data once it has been narrowed down:
    (defn federer-wins [csv]
        (with-open [r (io/reader csv)]
        (->> (csv/read-csv r)
             ...

Summary

In this chapter, we have looked at how to use two of Clojure's most important and useful functions for handling sequential data. From a practical point of view, you have seen how to use map and filter, as well as some patterns and idioms for accomplishing common tasks and avoiding some common problems. You are starting to build your mental toolkit for working with collections.

Working with map and filter means we are working with lazy sequences, and so this chapter explored some of the ins and outs of lazy evaluation, which is one of Clojure's fundamental building blocks.

The techniques for reading and parsing files, extracting, querying, and manipulating data will also be useful right away as we continue to build on these data-handling techniques in the next chapter.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Clojure Workshop
Published in: Jan 2020Publisher: PacktISBN-13: 9781838825485
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (5)

author image
Joseph Fahey

Joseph Fahey has been a developer for nearly two decades. He got his start in the Digital Humanities in the early 2000s. Ever since then, he has been trying to hone his skills and expand his inventory of techniques. This lead him to Common Lisp and then to Clojure when it was first introduced. As an independent developer, Joseph was able to quickly start using Clojure professionally. These days, Joseph gets to write Clojure for his day job at Empear AB.
Read more about Joseph Fahey

author image
Thomas Haratyk

Thomas Haratyk graduated from Lille University of Science and Technology and has been a professional programmer for nine years. After studying computer science and starting his career in France, he is now working as a consultant in London, helping start-ups develop their products and scale their platforms with Clojure, Ruby, and modern JavaScript.
Read more about Thomas Haratyk

author image
Scott McCaughie

Scott McCaughie lives near Glasgow, Scotland where he works as a senior Clojure developer for Previse, a Fintech startup aiming to solve the problem of slow payments in the B2B space. Having graduated from Heriot-Watt University, his first 6 years were spent building out Risk and PnL systems for JP Morgan. A fortuitous offer of a role learning and writing Clojure came up and he jumped at the chance. 5 years of coding later and it's the best career decision he's made. In his spare time, Scott is an avid reader, enjoys behavioral psychology and financial independence podcasts, and keeps fit by commuting by bike, running, climbing, hill walking, snowboarding. You get the picture!
Read more about Scott McCaughie

author image
Yehonathan Sharvit

Yehonathan Sharvit has been a software developer since 2001. He discovered functional programming in 2009. It has profoundly changed his view of programming and his coding style. He loves to share his discoveries and his expertise. He has been giving courses on Clojure and JavaScript since 2016. He holds a master's degree in Mathematics.
Read more about Yehonathan Sharvit

author image
Konrad Szydlo

Konrad Szydlo is a psychology and computing graduate from Bournemouth University. He has worked with Clojure for the last 8 years. Since January 2016, he has worked as a software engineer and team leader at Retailic, responsible for building a website for the biggest royalty program in Poland. Prior to this, he worked as a developer with Sky, developing e-commerce and sports applications, where he used Ruby, Java, and PHP. He is also listed in the Top 75 Datomic developers on GitHub.
Read more about Konrad Szydlo