You're reading from Snowflake Cookbook

Product typeBook

Published inFeb 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800560611

Edition1st Edition

Languages

Python

Tools

Cloud Foundry

Concepts

Data Science

Authors (2):

Hamid Mahmood Qureshi

Hammad Sharif

View More author details

Chapter 3: Loading and Extracting Data into and out of Snowflake

For any modern data warehouse system, getting data into the system and extracting data out of the system are key activities. This chapter provides a set of recipes that will guide you through the various nuances of loading data into Snowflake. The chapter talks about techniques for loading bulk data from on-premises systems and cloud storage and provides insights into the steps required to load streaming data into Snowflake by using Snowpipe.

The following recipes are included in this chapter:

Configuring Snowflake access to private S3 buckets
Loading delimited bulk data into Snowflake from cloud storage
Loading delimited bulk data into Snowflake from your local machine
Loading Parquet files into Snowflake
Making sense of JSON semi-structured data and transforming to a relational view
Processing Newline-Delimited JSON (or NDJSON) into a Snowflake table
Processing near real-time data...

Technical requirements

This chapter requires access to a modern internet browser (for instance, Chrome, Internet Explorer, or Firefox) and access to the internet to connect to your Snowflake instance in the cloud. Since we will be loading data from on-premises systems using internal stages, we will need to have the SnowSQL client installed.

The code for this chapter can be found at the following GitHub URL:

https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter03

Configuring Snowflake access to private S3 buckets

This recipe walks you through configuring access to private or restricted S3 buckets. The access configuration is a necessary step before you can load data from cloud storage, specifically from cloud storage buckets that are not public.

Getting ready

By now, you must have already created an S3 bucket that you intend to use to load data from the cloud to Snowflake. You should also have the privileges required to create a new Identity and Access Management (IAM) user and configure security settings in AWS. This recipe does not cover AWS configurations as it is assumed that you are well versed in managing AWS accounts and S3 storage. You should also know which AWS region your Snowflake instance is located in. You will require access to the ACCOUNTADMIN role in Snowflake as well.

How to do it…

Let's go through the steps required to set up a private or restricted S3 bucket for use with Snowflake:

Loading delimited bulk data into Snowflake from cloud storage

This recipe demonstrates several important concepts related to loading data stored in cloud storage, for example, Amazon S3 bucket, Azure Blob Storage, and so on, into your Snowflake instance. To load data from cloud storage, we will make use of the external stage concept in Snowflake.

Getting ready

As we will be loading a file from cloud storage to Snowflake, you should have a cloud storage bucket created with files present in the bucket. To save time, you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r2/cc_info.csv.

Before proceeding with this recipe, follow the steps in the Configuring Snowflake access to private S3 buckets recipe to set up access when using a private or restricted bucket.

How to do it…

In order to load a file from cloud storage, we will create an external stage pointing to the cloud storage and use...

Loading delimited bulk data into Snowflake from your local machine

This recipe demonstrates several important concepts related to loading data from an on-premises system to your Snowflake instance. We will make use of the internal stages in Snowflake to load the data from an on-premises system. This system could be a host machine running on your network or your laptop with files to be loaded stored in its storage.

Getting ready

Since we will be loading a file from your local system into Snowflake, we will need to first get such a file ready on the local system. To save time you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r3/customers.csv.

Now, since we will be loading data from your local system, we will need to use snowsql to upload the file to Snowflake. Please make sure that SnowSQL is installed on your system and you can successfully connect to your Snowflake instance.

How to do it&...

Loading Parquet files into Snowflake

This recipe demonstrates how to load Parquet-format data from files present on cloud storage. We will make use of an external stage created on top of an AWS S3 bucket and will load the Parquet-format data into a new table.

Getting ready

Since we will be loading a file from our local system into Snowflake, we will need to first get such a file ready on the local system. To save time, you can download a previously created sample file from https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r4.

How to do it…

We shall start by creating a database, followed by creating a table that will contain the Parquet file. We need will then create a file format for the Parquet file. The steps are as follows:

Let's first create a database where we will create our table and load it with some sample data:
```
CREATE DATABASE C3_R4;
```
Create a new table called TRANSACTIONS. We will load the sample data into this...

Making sense of JSON semi-structured data and transforming to a relational view

This recipe walks you through the process of loading JSON data into a Snowflake table and transforming it into a relational form that can then be further processed.

Getting ready

Since our objective is to demonstrate the processing of data in JSON format, we will not describe the process of creating stages and external data from cloud storage as that has been covered in previous recipes. For simplicity, we have made available a sample JSON file in a public cloud bucket that we will read and process. The JSON sample data is shown in the screenshot that follows. The data consists of two fields called data_set and extract_date and an array of objects called credit_cards. We are going to convert this data into columns and rows so that it can be processed in a relational manner:

Figure 3.14 – Sample JSON data

The JSON sample file can be found at https://github.com/PacktPublishing...

Processing newline-delimited JSON (or NDJSON) into a Snowflake table

This recipe walks you through the process of loading NDJSON data and transforming it into a relational form to then be loaded into a table. NDJSON is a JSON format in which each row is valid JSON in itself and therefore can be processed independently of the complete document. For more details about NDJSON, please see http://ndjson.org/.

Getting ready

Since our objective is to demonstrate the processing of NDJSON, we will not describe the process of creating stages and external data from cloud storage as it has already been covered in other recipes. For simplicity, we have made available a sample NDJSON file in a public cloud bucket that we will read and process. The NDJSON sample file can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/blob/master/Chapter03/r6/ndjson_sample.json if you would like to download it and use it in your own cloud storage buckets.

How to do it…

To demonstrate...

Processing near real-time data into a Snowflake table using Snowpipe

This recipe guides you through the process of setting up a Snowpipe and enabling the streaming of data from a cloud storage bucket.

Getting ready

Since we will be performing configuration with AWS to configure events on the S3 bucket, you should have the required privileges on the bucket. If you do not have access, you will need to ask your AWS administrator to configure the S3 bucket for you. The sample files, if you would like to download them for this recipe, can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter03/r7.

How to do it…

The steps for this recipe are as follows:

Let's first create a database where we will create our target table, Snowpipe, and the stage objects:
```
CREATE DATABASE SP_EX;
```

Create the target table where we will load the data through Snowpipe:

CREATE TABLE TRANSACTIONS
(
  Transaction_Date DATE,
  Customer_ID...

Extracting data from Snowflake

This recipe walks you through extracting data from a Snowflake table. We will extract the data into an internal stage and later demonstrate how to do the same via an external stage, which will result in data being written to an S3 bucket.

Getting ready

You should have a table already created and populated, from which we will extract data. This table can be any table that you have previously loaded as part of this chapter or any other table available in your environment.

Ensure that you have followed the steps in the Configuring Snowflake access to private S3 buckets recipe and have enabled read and write access for Snowflake to your bucket. If write access is not present, Snowflake will not be able to write the extract into your bucket.

We will use SnowSQL to get the file from the internal stage to the local directory, so make sure that SnowSQL is installed on your system and you can successfully connect to your Snowflake instance. We don...

The rest of the chapter is locked

You have been reading a chapter from

Snowflake Cookbook

Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages