Reader small image

You're reading from  Snowflake Cookbook

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800560611
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Hamid Mahmood Qureshi
Hamid Mahmood Qureshi
author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif
Hammad Sharif
author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

View More author details
Right arrow

Chapter 3: Loading and Extracting Data into and out of Snowflake

For any modern data warehouse system, getting data into the system and extracting data out of the system are key activities. This chapter provides a set of recipes that will guide you through the various nuances of loading data into Snowflake. The chapter talks about techniques for loading bulk data from on-premises systems and cloud storage and provides insights into the steps required to load streaming data into Snowflake by using Snowpipe.

The following recipes are included in this chapter:

  • Configuring Snowflake access to private S3 buckets
  • Loading delimited bulk data into Snowflake from cloud storage
  • Loading delimited bulk data into Snowflake from your local machine
  • Loading Parquet files into Snowflake
  • Making sense of JSON semi-structured data and transforming to a relational view
  • Processing Newline-Delimited JSON (or NDJSON) into a Snowflake table
  • Processing near real-time data...

Technical requirements

This chapter requires access to a modern internet browser (for instance, Chrome, Internet Explorer, or Firefox) and access to the internet to connect to your Snowflake instance in the cloud. Since we will be loading data from on-premises systems using internal stages, we will need to have the SnowSQL client installed.

The code for this chapter can be found at the following GitHub URL:

https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter03

Configuring Snowflake access to private S3 buckets

This recipe walks you through configuring access to private or restricted S3 buckets. The access configuration is a necessary step before you can load data from cloud storage, specifically from cloud storage buckets that are not public.

Getting ready

By now, you must have already created an S3 bucket that you intend to use to load data from the cloud to Snowflake. You should also have the privileges required to create a new Identity and Access Management (IAM) user and configure security settings in AWS. This recipe does not cover AWS configurations as it is assumed that you are well versed in managing AWS accounts and S3 storage. You should also know which AWS region your Snowflake instance is located in. You will require access to the ACCOUNTADMIN role in Snowflake as well.

How to do it…

Let's go through the steps required to set up a private or restricted S3 bucket for use with Snowflake:

  1. Log in to...

Loading delimited bulk data into Snowflake from cloud storage

This recipe demonstrates several important concepts related to loading data stored in cloud storage, for example, Amazon S3 bucket, Azure Blob Storage, and so on, into your Snowflake instance. To load data from cloud storage, we will make use of the external stage concept in Snowflake.

Getting ready

As we will be loading a file from cloud storage to Snowflake, you should have a cloud storage bucket created with files present in the bucket. To save time, you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r2/cc_info.csv.

Before proceeding with this recipe, follow the steps in the Configuring Snowflake access to private S3 buckets recipe to set up access when using a private or restricted bucket.

How to do it…

In order to load a file from cloud storage, we will create an external stage pointing to the cloud storage and use...

Loading delimited bulk data into Snowflake from your local machine

This recipe demonstrates several important concepts related to loading data from an on-premises system to your Snowflake instance. We will make use of the internal stages in Snowflake to load the data from an on-premises system. This system could be a host machine running on your network or your laptop with files to be loaded stored in its storage.

Getting ready

Since we will be loading a file from your local system into Snowflake, we will need to first get such a file ready on the local system. To save time you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r3/customers.csv.

Now, since we will be loading data from your local system, we will need to use snowsql to upload the file to Snowflake. Please make sure that SnowSQL is installed on your system and you can successfully connect to your Snowflake instance.

How to do it&...

Loading Parquet files into Snowflake

This recipe demonstrates how to load Parquet-format data from files present on cloud storage. We will make use of an external stage created on top of an AWS S3 bucket and will load the Parquet-format data into a new table.

Getting ready

Since we will be loading a file from our local system into Snowflake, we will need to first get such a file ready on the local system. To save time, you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r4.

How to do it…

We shall start by creating a database, followed by creating a table that will contain the Parquet file. We need will then create a file format for the Parquet file. The steps are as follows:

  1. Let's first create a database where we will create our table and load it with some sample data:
    CREATE DATABASE C3_R4;
  2. Create a new table called TRANSACTIONS. We will load the sample data into this...

Making sense of JSON semi-structured data and transforming to a relational view

This recipe walks you through the process of loading JSON data into a Snowflake table and transforming it into a relational form that can then be further processed.

Getting ready

Since our objective is to demonstrate the processing of data in JSON format, we will not describe the process of creating stages and external data from cloud storage as that has been covered in previous recipes. For simplicity, we have made available a sample JSON file in a public cloud bucket that we will read and process. The JSON sample data is shown in the screenshot that follows. The data consists of two fields called data_set and extract_date and an array of objects called credit_cards. We are going to convert this data into columns and rows so that it can be processed in a relational manner:

Figure 3.14 – Sample JSON data

The JSON sample file can be found at https://github.com/PacktPublishing...

Processing newline-delimited JSON (or NDJSON) into a Snowflake table

This recipe walks you through the process of loading NDJSON data and transforming it into a relational form to then be loaded into a table. NDJSON is a JSON format in which each row is valid JSON in itself and therefore can be processed independently of the complete document. For more details about NDJSON, please see http://ndjson.org/.

Getting ready

Since our objective is to demonstrate the processing of NDJSON, we will not describe the process of creating stages and external data from cloud storage as it has already been covered in other recipes. For simplicity, we have made available a sample NDJSON file in a public cloud bucket that we will read and process. The NDJSON sample file can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r6/ndjson_sample.json if you would like to download it and use it in your own cloud storage buckets.

How to do it…

To demonstrate...

Processing near real-time data into a Snowflake table using Snowpipe

This recipe guides you through the process of setting up a Snowpipe and enabling the streaming of data from a cloud storage bucket.

Getting ready

Since we will be performing configuration with AWS to configure events on the S3 bucket, you should have the required privileges on the bucket. If you do not have access, you will need to ask your AWS administrator to configure the S3 bucket for you. The sample files, if you would like to download them for this recipe, can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter03/r7.

How to do it…

The steps for this recipe are as follows:

  1. Let's first create a database where we will create our target table, Snowpipe, and the stage objects:
    CREATE DATABASE SP_EX;
  2. Create the target table where we will load the data through Snowpipe:
    CREATE TABLE TRANSACTIONS
    (
      Transaction_Date DATE,
      Customer_ID...

Extracting data from Snowflake

This recipe walks you through extracting data from a Snowflake table. We will extract the data into an internal stage and later demonstrate how to do the same via an external stage, which will result in data being written to an S3 bucket.

Getting ready

You should have a table already created and populated, from which we will extract data. This table can be any table that you have previously loaded as part of this chapter or any other table available in your environment.

Ensure that you have followed the steps in the Configuring Snowflake access to private S3 buckets recipe and have enabled read and write access for Snowflake to your bucket. If write access is not present, Snowflake will not be able to write the extract into your bucket.

We will use SnowSQL to get the file from the internal stage to the local directory, so make sure that SnowSQL is installed on your system and you can successfully connect to your Snowflake instance. We don...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Snowflake Cookbook
Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif