You're reading from Cloud Scale Analytics with Azure Data Services

Product type Book

Published in Jul 2021

Publisher Packt

ISBN-13 9781800562936

Pages 520 pages

Edition 1st Edition

Languages

Concepts

Data Streaming

Author (1):

Patrik Borosch

Table of Contents (20) Chapters

Preface

Section 1: Data Warehousing and Considerations Regarding Cloud Computing

Chapter 1: Balancing the Benefits of Data Lakes Over Data Warehouses

Chapter 2: Connecting Requirements and Technology

Section 2: The Storage Layer

Chapter 3: Understanding the Data Lake Storage Layer

Chapter 4: Understanding Synapse SQL Pools and SQL Options

Section 3: Cloud-Scale Data Integration and Data Transformation

Chapter 5: Integrating Data into Your Modern Data Warehouse

Chapter 6: Using Synapse Spark Pools

Chapter 7: Using Databricks Spark Clusters

Chapter 8: Streaming Data into Your MDWH

Chapter 9: Integrating Azure Cognitive Services and Machine Learning

Chapter 10: Loading the Presentation Layer

Section 4: Data Presentation, Dashboarding, and Distribution

Chapter 11: Developing and Maintaining the Presentation Layer

Chapter 12: Distributing Data

Chapter 13: Introducing Industry Data Models

Chapter 14: Establishing Data Governance

Other Books You May Enjoy

Loading data

With all the parallel options that the database can offer to you, you want to use them when you load data to your database, too. Remember the purpose of the control and the compute nodes? When loading data to your database, you want to use a technique that makes use of the compute nodes as much as possible.

Using the COPY statement

The COPY statement will support you in doing so. It will talk directly to the compute nodes and will therefore use the whole parallelism that the database can offer. It comes as part of the T-SQL dialect of the Synapse Analytics database and offers many options to influence the loading of data to the database.

When you talk to the control node, in contrast to the capability of the COPY statement, you will create a bottleneck during your load. The load would be single-threaded instead and all the rows that need to be written to the database would first flow through the control node and would then be spread to the distributions using...