You're reading from SQL for Data Analytics

Product typeBook

Published inAug 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781789807356

Edition1st Edition

Languages

SQL

Tools

PostgreSQL

Concepts

Data Analysis

Authors (3):

Upom Malik

Matt Goldwasser

Benjamin Johnston

View More author details

Appendix

About

This section is included to assist the readers to perform the activities in the book. It includes detailed steps that are to be performed by the readers to achieve the objectives of the activities.

1. Understanding and Describing Data

Activity 1: Classifying a New Dataset

Solution

The unit of observation is a car purchase.
Date and Sales Amount are quantitative, while Make is qualitative.
While there could be many ways to convert Make into quantitative data, one commonly accepted method would be to map each of the Make types to a number. For instance, Ford could map to 1, Honda could map to 2, Mazda could map to 3, Toyota could map to 4, Mercedes could map to 5, and Chevy could map to 6.

Activity 2: Exploring Dealership Sales Data

Solution

Open Microsoft Excel to a blank workbook.
Go to the Data tab and click on From Text.
Find the path to the dealerships.csv file and click on OK.
Choose the Delimited option in the Text Import Wizard dialog box, and make sure to start the import at row 1. Now, click on Next.
Select the delimiter for your file. As this file is only one column, it has no delimiters, although CSVs traditionally...

2. The Basics of SQL for Analytics

Activity 3: Querying the customers Table Using Basic Keywords in a SELECT Query

Solution

Open your favorite SQL client and connect to the sqlda database. Examine the schema for the customers table from the schema dropdown. Notice the names of the columns, the same as we did in Exercise 6, Querying Salespeople, for the salespeople table.
Execute the following query to fetch customers in the state of Florida in alphabetical order:
```
SELECT email
FROM customers
WHERE state='FL'
ORDER BY email
```
The following is the output of the preceding code:
Figure 2.13: Emails of customers from Florida in alphabetical order
Execute the following query to pull all the first names, last names, and email addresses for ZoomZoom customers in New York City in the state of New York. The customers would be ordered alphabetically by the last name followed by the first name:
```
SELECT first_name, last_name, email
FROM customers
WHERE city='New York City...
```

3. SQL for Data Preparation

Activity 5: Building a Sales Model Using SQL Techniques

Solution

Open your favorite SQL client and connect to the sqlda database.

Follow the steps mentioned with the scenario and write the query for it. There are many approaches to this query, but one of these approaches could be:

SELECT 
c.*,
p.*,
COALESCE(s.dealership_id, -1),
CASE WHEN p.base_msrp - s.sales_amount >500 THEN 1 ELSE 0 END AS high_savings 
FROM sales s
INNER JOIN customers c ON c.customer_id=s.customer_id
INNER JOIN products p ON p.product_id=s.product_id
LEFT JOIN dealerships d ON s.dealership_id = d.dealership_id;

The following is the output of the preceding code:

Figure 3.21: Building a sales model query

Thus, have the data to build a new model that will help the data science team to predict which customers are the best prospects for remarketing from the output generated.

4. Aggregate Functions for Data Analysis

Activity 6: Analyzing Sales Data Using Aggregate Functions

Solution

Open your favorite SQL client and connect to the sqlda database.
Calculate the number of unit sales the company has achieved by using the COUNT function:
```
SELECT COUNT(*)
FROM sales;
```
You should get 37,711 sales.
Determine the total sales amount in dollars for each state; we can use the SUM aggregate function here:
```
SELECT c.state, SUM(sales_amount) as total_sales_amount
FROM sales s
INNER JOIN customers c ON c.customer_id=s.customer_id
GROUP BY 1
ORDER BY 1;
```
You will get the following output:
Figure 4.23: Total sales in dollars by US state
Determine the top five dealerships in terms of most units sold, using the GROUP BY clause and set LIMIT as 5:
```
SELECT s.dealership_id, COUNT(*)
FROM sales s
WHERE channel='dealership'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
```
You should get the following output:
Figure 4.24: Top five dealerships by units sold
Calculate...

5. Window Functions for Data Analysis

Activity 7: Analyzing Sales Using Window Frames and Window Functions

Solution

Open your favorite SQL client and connect to the sqlda database.

Calculate the total sales amount for all individual months in 2018 using the SUM function:

SELECT sales_transaction_date::DATE,
SUM(sales_amount) as total_sales_amount
FROM sales
WHERE sales_transaction_date>='2018-01-01'
AND sales_transaction_date<'2019-01-01'
GROUP BY 1
ORDER BY 1;

The following is the output of the preceding code:

Figure 5.15: Total sales amount by month

Now, calculate the rolling 30-day average for the daily number of sales deals, using a window frame:

WITH daily_deals as (
SELECT sales_transaction_date::DATE,
COUNT(*) as total_deals
FROM sales
GROUP BY 1
),
moving_average_calculation_30 AS (
SELECT sales_transaction_date, total_deals,
AVG(total_deals) OVER (ORDER BY sales_transaction_date ROWS BETWEEN 30 PRECEDING and CURRENT ROW) AS deals_moving_average...

6. Importing and Exporting Data

Activity 8: Using an External Dataset to Discover Sales Trends

Solution

The dataset can be downloaded from GitHub using the link provided. Once you go to the web page, you should be able to Save Page As… using the menus on your browser:
Figure 6.24: Saving the public transportation .csv file
The simplest way to transfer the data in a CSV file to pandas is to create a new Jupyter notebook. At the command line, type jupyter notebook (if you do not have a notebook server running already). In the browser window that pops up, create a new Python 3 notebook. In the first cell, you can type in the standard import statements and the connection information (replacing your_X with the appropriate parameter for your database connection):
```
from sqlalchemy import create_engine
import pandas as pd
% matplotlib inline
cnxn_string = ("postgresql+psycopg2://{username}:{pswd}"
          ...
```

7. Analytics Using Complex Data Types

Activity 9: Sales Search and Analysis

Solution

First, create the materialized view on the customer_sales table:

CREATE MATERIALIZED VIEW customer_search AS (
    SELECT 
        customer_json -> 'customer_id' AS customer_id,
        customer_json,
        to_tsvector('english', customer_json) AS search_vector
    FROM customer_sales
);

Create the GIN index on the view:

CREATE INDEX customer_search_gin_idx ON customer_search USING GIN(search_vector);

We can solve the request by using our new searchable database:

SELECT
    customer_id,
    customer_json
FROM customer_search 
WHERE search_vector @@ plainto_tsquery('english', 'Danny Bat');

This results in eight matching rows:

Figure 7.29: Resulting...

8. Performant SQL

Activity 10: Query Planning

Solution:

Open PostgreSQL and connect to the sqlda database:
```
C:\> psql sqlda
```
Use the EXPLAIN command to return the query plan for selecting all available records within the customers table:
```
sqlda=# EXPLAIN SELECT * FROM customers;
```
This query will produce the following output from the planner:
Figure 8.75: Plan for all records within the customers table
The setup cost is 0, the total query cost is 1536, the number of rows is 50000, and the width of each row is 140. The cost is actually in cost units, the number of rows is in rows, and the width is in bytes.
Repeat the query from step 2 of this activity, this time limiting the number of returned records to 15:
```
sqlda=# EXPLAIN SELECT * FROM customers LIMIT 15;
```
This query will produce the following output from the planner:
Figure 8.76: Plan for all records within the customers table with the limit as 15
Two steps are involved in the query, and the limiting step costs 0.46 units...

9. Using SQL to Uncover the Truth – a Case Study

Activity 18: Quantifying the Sales Drop

Solution

Load the sqlda database:
```
$ psql sqlda
```
Compute the daily cumulative sum of sales using the OVER and ORDER BY statements. Insert the results into a new table called bat_sales_growth:
```
sqlda=# SELECT *, sum(count) OVER (ORDER BY sales_transaction_date) INTO bat_sales_growth FROM bat_sales_daily;
```
The following table shows the daily cumulative sum of sales:
Figure 9.48: Daily sales count
Compute a 7-day lag function of the sum column and insert all the columns of bat_sales_daily and the new lag column into a new table, bat_sales_daily_delay. This lag column indicates what the sales were like 1 week before the given record:
```
sqlda=# SELECT *, lag(sum, 7) OVER (ORDER BY sales_transaction_date) INTO bat_sales_daily_delay FROM bat_sales_growth;
```
Inspect the first 15 rows of bat_sales_growth:
```
sqlda=# SELECT * FROM bat_sales_daily_delay LIMIT 15;
```
The following is the output of...

The rest of the chapter is locked

You have been reading a chapter from

SQL for Data Analytics

Published in: Aug 2019Publisher: PacktISBN-13: 9781789807356

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (3)

Upom Malik

Upom Malik is a data science and analytics leader who has worked in the technology industry for over 8 years. He has a master's degree in chemical engineering from Cornell University and a bachelor's degree in biochemistry from Duke University. As a data scientist, Upom has overseen efforts across machine learning, experimentation, and analytics at various companies across the United States. He uses SQL and other tools to solve interesting challenges in finance, energy, and consumer technology. Outside of work, he likes to read, hike the trails of the Northeastern United States, and savor ramen bowls from around the world.
Read more about Upom Malik

Matt Goldwasser

Matt Goldwasser is the Head of Applied Data Science at the T. Rowe Price NYC Technology Development Center. Prior to his current role, Matt was a data science manager at OnDeck, and prior to that, he was an analyst at Millennium Management. Matt holds a bachelor of science in mechanical and aerospace engineering from Cornell University.
Read more about Matt Goldwasser

Benjamin Johnston

Benjamin Johnston is a senior data scientist for one of the world's leading data-driven MedTech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his Ph.D. in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years of experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
Read more about Benjamin Johnston

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from SQL for Data Analytics

Appendix

About

1. Understanding and Describing Data

Activity 1: Classifying a New Dataset

Activity 2: Exploring Dealership Sales Data

2. The Basics of SQL for Analytics

Activity 3: Querying the customers Table Using Basic Keywords in a SELECT Query

Figure 2.13: Emails of customers from Florida in alphabetical order

3. SQL for Data Preparation

Activity 5: Building a Sales Model Using SQL Techniques

Figure 3.21: Building a sales model query

4. Aggregate Functions for Data Analysis

Activity 6: Analyzing Sales Data Using Aggregate Functions

Figure 4.23: Total sales in dollars by US state

Figure 4.24: Top five dealerships by units sold

5. Window Functions for Data Analysis

Activity 7: Analyzing Sales Using Window Frames and Window Functions

Figure 5.15: Total sales amount by month

6. Importing and Exporting Data

Activity 8: Using an External Dataset to Discover Sales Trends

Figure 6.24: Saving the public transportation .csv file

7. Analytics Using Complex Data Types

Activity 9: Sales Search and Analysis

Figure 7.29: Resulting...

8. Performant SQL

Activity 10: Query Planning

Figure 8.75: Plan for all records within the customers table

Figure 8.76: Plan for all records within the customers table with the limit as 15

9. Using SQL to Uncover the Truth – a Case Study

Activity 18: Quantifying the Sales Drop

Figure 9.48: Daily sales count

Unlock this book and the full library FREE for 7 days

Authors (3)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook