Packt+ | Advance your knowledge in tech

You're reading from SQL Server 2017 Integration Services Cookbook

Product typeBook

Published inJun 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781786461827

Edition1st Edition

Languages

SQL

Tools

SQL Server

Concepts

Database Administration

Authors (6):

Christian Cote

Dejan Sarka

David Peter Hansen

Matija Lah

Samuel Lester

Christo Olivier

View More author details

Chapter 6. SSIS Performance and Scalability

This chapter covers the following recipes:

Using SQL Server Management Studio to execute an SSIS package
Using T-SQL to execute an SSIS package
Using the DTExec command-line utility to execute an SSIS package
Scheduling an SSIS package execution
Using the cascading lookup pattern
Using the lookup cache
Using lookup expressions
Determining the maximum number of worker threads in a data flow
Using the master package concept
Requesting an execution tree in SSDT
Establishing a performance monitor session
Configuring a performance monitor data collector set

Introduction

This chapter discusses the various methods of SSIS package execution, how to monitor the performance of running SSIS packages, and how to plan the utilization of resources for a given SSIS package. You will also learn how to use different techniques of acquiring reference data (also referred to as data look ups), and their impact on SSIS execution performance.

One of the objectives followed in the design of the SSIS execution engine is to maximize the use of resources on the system hosting SSIS package executions. In part, this is reflected in the capabilities of parallel execution of various operations; for instance, using multiple threads to perform data movements and transformations in the data flow, parallelizing the execution of operations in the control flow, or even scaling out the execution of packages to multiple hosting servers. Some of the techniques that you can use to improve resource utilization for SSIS executions are also discussed in this chapter.

SSIS execution...

Using SQL Server Management Studio to execute an SSIS package

In this recipe, you are going to use SQL Server Management Studio (SSMS) to prepare, and invoke, the execution of an SSIS package deployed to the SSISDB catalog.

Getting ready

Even if you have successfully completed the exercises in Chapter 2 , What Is New in SSIS 2016, follow these steps to create an SSIS environment, and configure the CustomLogging project:

In SSMS, open the Chapter06_Configuration.sql script located in the C:\SSIS2016Cookbook\Chapter06\Scripts folder.
Carefully review the script, and then execute it. The script will create the Chapter06 environment with a single environment variable, and associate it with the CustomLogging project. The cmgr_TestCustomLogging_CS variable will allow you to configure the connection manager used by the project.

How to do it...

Start SSMS, unless it is already running, and make sure that the Object Explorer is connected to the local SQL Server instance on your machine.
In the Object...

Using T-SQL to execute an SSIS package

In this recipe, you are going to perform all three steps of SSIS package execution by using three special stored procedures in the SSISDB database.

This procedure can be used only on packages deployed to the SSISDB Catalog.

How to do it...

In SSMS, connect to the SSISDB database; that is, the user database hosting the SSISDB catalog. You can use the following command:

USE SSISDB;

Note

For your convenience, the T-SQL code needed for this chapter is provided in the Chapter06.sql script, located in the C:\SSIS2016Cookbook\Chapter06\Scripts folder.

Use the following query to retrieve the identifier of the environment reference, and assign the value to a variable:

DECLARE @reference_id INT;
      
SET @reference_id = (
  SELECT environment_references.reference_id
  FROM catalog.folders
    INNER JOIN catalog.projects
    ON projects.folder_id = folders.folder_id
    INNER JOIN catalog.environment_references
    ON environment_references.project_id = projects.project_id...

Using the DTExec command-line utility to execute an SSIS package

In this recipe, you are going to execute an SSIS package using the DTExec command-line utility. This utility supports not only packages deployed to the SSISDB catalog, but also packages managed by the legacy SSIS Service (stored in the msdb system database, or in the managed SSIS package store), and even packages stored in the filesystem.

How to do it...

Using Windows Explorer, locate the Chapter06_Execution_DTExec.bat command file in the C:\SSIS2016Cookbook\Chapter06\Scripts\ folder.
Right-click the file, and select Edit from the shortcut menu to open the file in Notepad.
Inspect the DTExec command line:

    DTExec /Server localhost /ISServer "\SSISDB\CustomLogging\CustomLogging\CustomLogging.dtsx" /Env 1 /Par $ServerOption::LOGGING_LEVEL(Int32);1

The /Server argument provides the name of the SSIS Server, the /ISServer argument instructs the utility to load the SSIS package from the SSISDB Catalog (the complete path to the package...

Scheduling an SSIS package execution

In this recipe, you are going to create an SQL Server Agent job with a single step using an SSIS package, configured with a specific SSIS environment. You are going to assign a schedule to the job so that it can be executed automatically.

SQL Server Agent is a special SQL Server feature, hosted on the SQL Server instance, which supports the automation of a variety of operations and processes. One of them is the execution of SSIS packages; these can be configured as one or more steps of an SQL Server Agent job.

Note

SQL Server Agent and SQL Server Agent jobsSQL Server Agent is available in SQL Server 2016 Enterprise, Standard, and Web editions; it is not available in the Express, nor Express with Advanced Services editions.SQL Server Agent Job is a collection of one or more operations that represent a complete unit of work to be performed automatically, for example, on a schedule. Multiple steps of an SQL Server Agent job are executed in sequence.

In addition...

Using the cascading lookup pattern

Typically, the structure and the semantics of a data flow source correspond to the data model used in the source data store; this structure, or the semantics used to represent data in the source system, might not be aligned with the structure or the semantics of the destination system.

For instance, the client entity in the source system might be represented by a single set, but the data warehouse might have to distinguish between a client, who is a person, and a client that represents a company. To correctly interpret the source data, you would need appropriate logic in the data flow to differentiate between source rows representing persons, and source rows representing companies, before loading the data correctly into the data destination data store.

How to do it...

In SSDT, open the AdventureWorksETL.sln solution located in the C:\SSIS2016Cookbook\Chapter06\Starter\AdventureWorksETL\ folder.
Make sure that the CascadingLookup.dtsx SSIS package is open, locate...

Using the lookup cache

The Lookup Transformation can use two different connection types: the OLE DB connection, which requires the reference data to be stored in a data store that can be accessed by the OLE DB data provider, or a Cache connection that requires thee data to be available in an SSIS cache object.

In essence, there are three different modes of operation (depending on how the reference data is made available):

In full cache mode, the reference data needs to be loaded completely into memory (cached) before the transformation can be used. Data is either loaded automatically (when an OLE DB connection is used to retrieve the lookup set), or needs to be loaded before the data flow, in which the lookup set is needed, and starts executing (when the cache connection is used to access the reference set).
With partial cache, the reference data is loaded into memory at run time, while the pipeline rows are being processed, and the execution engine determines automatically (based on the reference...

Using lookup expressions

To take advantage of the full cache mode in Lookup transformations, but only retrieve a subset of reference rows, you can supply the lookup query at run time. Certain properties of the data flow task can be modified at run time using expressions; the query used in a Lookup transformation is one such property.

In this recipe, you are going to dynamically determine the query restrictions and prepare the reference query in each iteration of the Foreach loop container processing the input files.

How to do it...

Make sure that the AdventureWorksETL.sln solution is open in SSDT, and that the LookupExpression.dtsx package is active in the control flow designer. The solution is located in the C:\SSIS2016Cookbook\Chapter06\Starter\AdventureWorksETL\ folder.
Create three new package variables using the following information; the Variables window can be opened by selecting Variables in the SSIS menu when an SSIS package is active in the control flow designer:

Determining the maximum number of worker threads in a data flow

Generally, multiple operations can be performed concurrently in SSIS, as long as sufficient resources are available in the environment hosting the execution. Parallelism can be achieved at several different levels, depending on the nature of the operations and the availability of resources.

Inside a data flow task, the data movements and transformations can be performed on one or more worker threads. Generally, the execution engine will always attempt to parallelize as many of the operations of a particular data flow as possible—in line with the nature of the transformations, and restricted by the available resources.

For instance, provided that enough worker threads are available for a particular transformation, and enough system memory can be allocated for the pipeline buffers, more than one instance of the same transformation can run concurrently. By setting the EngineThreads data flow property, you can restrict the number...

Using the master package concept

By using the master package concept, it is possible to parallelize the execution of multiple child packages as determined in the control flow of the master package. Typically, this approach is used to parallelize packages that use separate data sources and data destinations (for instance, processing separate dimension tables in data warehousing scenarios), or to parallelize packages with CPU-intensive operations to run concurrently with packages with I/O-intensive operations.

In the control flow, you use precedence constraints and containers to determine which operations can be performed in parallel, and which of them must be performed in sequence. Through the MaxConcurrentExecutables package property, you can determine the maximum number of tasks to be performed simultaneously.

For instance, in data warehousing scenarios, fact tables are processed after the associated dimension tables have been processed successfully. Prior to fact processing, most dimension...

Requesting an execution tree in SSDT

Every time the execution of an SSIS package is started, the SSIS execution engine first prepares the execution plan. This plan contains the package metadata used by the execution engine to determine the range of resources that are going to be needed to perform the operations defined by the package.

The Execution Tree of a given SSIS package, representing its execution plan, can be prepared on demand at design time. The functionality is available when the SSIS package is being edited in SSDT. The purpose of the Execution Tree is to provide you with the same information the execution engine uses to determine resource usage. By examining the Execution Trees, you can understand the expected behavior of the package in terms of resource usage, and become familiar with resource requirements before the SSIS package is deployed, or used, in the destination environment.

In this recipe, you are going to configure an SSIS package to capture two special events during...

Monitoring SSIS performance

When SSIS packages are executed, the execution engine emits a variety of events and messages that can be captured in the operating system hosting the execution. Typically, Windows Performance Monitor, an application of the Windows operating system, can be configured to capture the SSIS performance counters during SSIS executions.

The following performance counters are available in SQL Server 2016:

Name

Data type...

Establishing a performance monitor session

In this recipe, you are going to prepare an ad-hoc performance monitoring session by using the Performance Monitor, a component of the Windows operating system.

How to do it...

In the Windows Start menu, locate the Performance Monitor and open it.
In the tree on the left, expand the Monitoring Tools node, and select the Performance Monitor node.
In the command ribbon on the right side of the window, click the Add icon, marked by the green plus sign, to open the Add Counters dialog.
In the Available counters list, locate the SQL Server SSIS Pipeline 13.0 group, and expand it by clicking the tiny downward arrow to the right of the group name.
With the help of the ctrl button on your keyboard, and the mouse, select the following counters:
- Buffer memory
- Buffers in use
- Buffers spooled
- Flat buffer memory
- Flat buffers in use
- Private buffer memory
- Private buffers in use

Click Add >> below the list. The counters should now be listed in the Added counters list on...

Configuring a performance monitor data collector set

In this recipe, you are going to prepare a Performance Monitor data collector set using the same settings that you used in the previous recipe. This time the performance monitoring operation is going to run in the background, and you will be able to inspect the performance data after it completes.

How to do it...

Open the Performance Monitor.
In the tree on the left side of the window, navigate to the Performance Monitor node, as described in the previous recipe.
When the graph is visible, right-click the Performance Monitor node, expand the New sub-menu in the shortcut menu, and select Data Collector Set, as shown here:

The Data Collector Set wizard starts. On the first page, enter Chapter06 in the Name text box, and then click Next.
On page two, leave the default value of the Root directory property unchanged, and click Next.
On page three, leave the <Default> value of the Run as property unchanged, and make sure that the Save and close...

The rest of the chapter is locked

You have been reading a chapter from

SQL Server 2017 Integration Services Cookbook

Published in: Jun 2017Publisher: PacktISBN-13: 9781786461827

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (6)

Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

Dejan Sarka

Dejan Sarka, MCT and Microsoft Data Platform MVP, is an independent trainer and consultant who focuses on the development of database and business intelligence applications. He is the founder of the Slovenian SQL Server and .NET Users Group.
Read more about Dejan Sarka

David Peter Hansen

Other recommended products

Related to this chapter

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Data Science with SQL Server Quick Start Guide

SQL Server started to fully support data science only with its last two editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning Services for their projects, then this is the ideal book for you.

BookAug 2018206 pages

Hands-On Data Science with SQL Server 2017

Learn how to utilize Microsoft SQL Server with NoSQL concepts for data science challenges. This book will help enhance your knowledge beyond data querying & processing tasks by implementing a data science pipeline. We will implement data science tasks and show how to use them on a day-to-day basis for efficient smart predictive models.

BookNov 2018506 pages

Hands-On SQL Server 2019 Analysis Services

This book will expand your ability to deliver meaningful, performant solutions to your organization. You’ll learn how to use an analytical engine for decision making and business analytics. With the help of this practical guide, you’ll also be able to work confidently with data and analytics.

BookOct 2020474 pages

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

BookJan 2017372 pages

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

SQL Server 2017 Machine Learning Services with R

With integrated R Services within SQL Server 2017, developers and data scientists can now benefit from the integrated, effective, efficient and more streamlined analytics environment. In this book, you will understand how to leverage the capabilities of R Services in SQL Server 2017. This short yet effective guide will help you get familiar with SQL Server 2017 R Services, and will show how to implement efficient data science models using it.

BookFeb 2018338 pages

SQL Server 2019 Administrator's Guide

This book will give you all the information you need to become an expert database administrator and master the administrative aspects of SQL Server 2019. From setting up and configuring your SQL Server instance to fine-tuning your database, this extensive guide will teach you the nitty-gritty of SQL Server 2019 administration.

BookSep 2020522 pages

SQL Server 2017 Administrator's Guide

This book will give you all the information you need to become an expert database administrator, and master the administrative aspects of SQL Server 2017. From setting up and configuring your SQL Server instance to fine-tuning your database, this extensive guide will teach you the nitty-gritty of SQL Server 2017 administration.

BookDec 2017434 pages

SQL Server 2016 Developer's Guide

This book is designed to get you up to speed with SQL Server 2016, covering the essential concepts and techniques. By the end of this book, you’ll be able to design efficient, high-performance database applications confidently.

BookMar 2017616 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

Performance counter

Description

BLOB bytes read

The number of bytes of binary large object (BLOB) data that the data flow engine has read from all sources.

BLOB bytes written

The number of bytes of BLOB data that the data flow engine has written to all destinations.

BLOB files in use

The number of BLOB files that the data flow engine currently is using for spooling.

Buffer memory

The amount of memory that is in use. This may include both physical and virtual memory. When this number is larger than the amount of physical memory, the Buffers...