Reader small image

You're reading from  SQL Server 2017 Integration Services Cookbook

Product typeBook
Published inJun 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786461827
Edition1st Edition
Languages
Right arrow
Authors (6):
Christian Cote
Christian Cote
author image
Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

Dejan Sarka
Dejan Sarka
author image
Dejan Sarka

Dejan Sarka, MCT and Microsoft Data Platform MVP, is an independent trainer and consultant who focuses on the development of database and business intelligence applications. He is the founder of the Slovenian SQL Server and .NET Users Group.
Read more about Dejan Sarka

Matija Lah
Matija Lah
author image
Matija Lah

Matija Lah has more than 18 years of experience working with Microsoft SQL Server, mostly from architecting data-centric solutions in the legal domain. His contributions to the SQL Server community have led to him being awarded the MVP Professional award (Data Platform) between 2007 and 2017/2018. He spends most of his time on projects involving advanced information management and natural language processing, but often finds time to speak at events related to Microsoft SQL Server where he loves to share his experience with the SQL Server platform.
Read more about Matija Lah

View More author details
Right arrow

Chapter 6. SSIS Performance and Scalability

This chapter covers the following recipes:

  • Using SQL Server Management Studio to execute an SSIS package
  • Using T-SQL to execute an SSIS package
  • Using the DTExec command-line utility to execute an SSIS package
  • Scheduling an SSIS package execution
  • Using the cascading lookup pattern
  • Using the lookup cache
  • Using lookup expressions
  • Determining the maximum number of worker threads in a data flow
  • Using the master package concept
  • Requesting an execution tree in SSDT
  • Establishing a performance monitor session
  • Configuring a performance monitor data collector set

Introduction


This chapter discusses the various methods of SSIS package execution, how to monitor the performance of running SSIS packages, and how to plan the utilization of resources for a given SSIS package. You will also learn how to use different techniques of acquiring reference data (also referred to as data look ups), and their impact on SSIS execution performance.

One of the objectives followed in the design of the SSIS execution engine is to maximize the use of resources on the system hosting SSIS package executions. In part, this is reflected in the capabilities of parallel execution of various operations; for instance, using multiple threads to perform data movements and transformations in the data flow, parallelizing the execution of operations in the control flow, or even scaling out the execution of packages to multiple hosting servers. Some of the techniques that you can use to improve resource utilization for SSIS executions are also discussed in this chapter.

SSIS execution...

Using SQL Server Management Studio to execute an SSIS package


In this recipe, you are going to use SQL Server Management Studio (SSMS) to prepare, and invoke, the execution of an SSIS package deployed to the SSISDB catalog.

Getting ready

Even if you have successfully completed the exercises in Chapter 2 , What Is New in SSIS 2016, follow these steps to create an SSIS environment, and configure the CustomLogging project:

  1. In SSMS, open the Chapter06_Configuration.sql script located in the C:\SSIS2016Cookbook\Chapter06\Scripts folder.
  2. Carefully review the script, and then execute it. The script will create the Chapter06 environment with a single environment variable, and associate it with the CustomLogging project. The cmgr_TestCustomLogging_CS variable will allow you to configure the connection manager used by the project.

How to do it...

  1. Start SSMS, unless it is already running, and make sure that the Object Explorer is connected to the local SQL Server instance on your machine.
  2. In the Object...

Using T-SQL to execute an SSIS package


In this recipe, you are going to perform all three steps of SSIS package execution by using three special stored procedures in the SSISDB database.

This procedure can be used only on packages deployed to the SSISDB Catalog.

How to do it...

  1. In SSMS, connect to the SSISDB database; that is, the user database hosting the SSISDB catalog. You can use the following command:
USE SSISDB;

Note

For your convenience, the T-SQL code needed for this chapter is provided in the Chapter06.sql script, located in the C:\SSIS2016Cookbook\Chapter06\Scripts folder.

  1. Use the following query to retrieve the identifier of the environment reference, and assign the value to a variable:
DECLARE @reference_id INT;
      
SET @reference_id = (
  SELECT environment_references.reference_id
  FROM catalog.folders
    INNER JOIN catalog.projects
    ON projects.folder_id = folders.folder_id
    INNER JOIN catalog.environment_references
    ON environment_references.project_id = projects.project_id...

Using the DTExec command-line utility to execute an SSIS package


In this recipe, you are going to execute an SSIS package using the DTExec command-line utility. This utility supports not only packages deployed to the SSISDB catalog, but also packages managed by the legacy SSIS Service (stored in the msdb system database, or in the managed SSIS package store), and even packages stored in the filesystem.

How to do it...

  1. Using Windows Explorer, locate the Chapter06_Execution_DTExec.bat command file in the C:\SSIS2016Cookbook\Chapter06\Scripts\ folder.
  2. Right-click the file, and select Edit from the shortcut menu to open the file in Notepad.
  3. Inspect the DTExec command line:
    DTExec /Server localhost /ISServer "\SSISDB\CustomLogging\CustomLogging\CustomLogging.dtsx" /Env 1 /Par $ServerOption::LOGGING_LEVEL(Int32);1

The /Server argument provides the name of the SSIS Server, the /ISServer argument instructs the utility to load the SSIS package from the SSISDB Catalog (the complete path to the package...

Scheduling an SSIS package execution


In this recipe, you are going to create an SQL Server Agent job with a single step using an SSIS package, configured with a specific SSIS environment. You are going to assign a schedule to the job so that it can be executed automatically.

SQL Server Agent is a special SQL Server feature, hosted on the SQL Server instance, which supports the automation of a variety of operations and processes. One of them is the execution of SSIS packages; these can be configured as one or more steps of an SQL Server Agent job.

Note

SQL Server Agent and SQL Server Agent jobsSQL Server Agent is available in SQL Server 2016 Enterprise, Standard, and Web editions; it is not available in the Express, nor Express with Advanced Services editions.SQL Server Agent Job is a collection of one or more operations that represent a complete unit of work to be performed automatically, for example, on a schedule. Multiple steps of an SQL Server Agent job are executed in sequence.

In addition...

Using the cascading lookup pattern


Typically, the structure and the semantics of a data flow source correspond to the data model used in the source data store; this structure, or the semantics used to represent data in the source system, might not be aligned with the structure or the semantics of the destination system.

For instance, the client entity in the source system might be represented by a single set, but the data warehouse might have to distinguish between a client, who is a person, and a client that represents a company. To correctly interpret the source data, you would need appropriate logic in the data flow to differentiate between source rows representing persons, and source rows representing companies, before loading the data correctly into the data destination data store.

How to do it...

  1. In SSDT, open the AdventureWorksETL.sln solution located in the C:\SSIS2016Cookbook\Chapter06\Starter\AdventureWorksETL\ folder.
  2. Make sure that the CascadingLookup.dtsx SSIS package is open, locate...

Using the lookup cache


The Lookup Transformation can use two different connection types: the OLE DB connection, which requires the reference data to be stored in a data store that can be accessed by the OLE DB data provider, or a Cache connection that requires thee data to be available in an SSIS cache object.

In essence, there are three different modes of operation (depending on how the reference data is made available):

  • In full cache mode, the reference data needs to be loaded completely into memory (cached) before the transformation can be used. Data is either loaded automatically (when an OLE DB connection is used to retrieve the lookup set), or needs to be loaded before the data flow, in which the lookup set is needed, and starts executing (when the cache connection is used to access the reference set).
  • With partial cache, the reference data is loaded into memory at run time, while the pipeline rows are being processed, and the execution engine determines automatically (based on the reference...

Using lookup expressions


To take advantage of the full cache mode in Lookup transformations, but only retrieve a subset of reference rows, you can supply the lookup query at run time. Certain properties of the data flow task can be modified at run time using expressions; the query used in a Lookup transformation is one such property.

In this recipe, you are going to dynamically determine the query restrictions and prepare the reference query in each iteration of the Foreach loop container processing the input files.

How to do it...

  1. Make sure that the AdventureWorksETL.sln solution is open in SSDT, and that the LookupExpression.dtsx package is active in the control flow designer. The solution is located in the C:\SSIS2016Cookbook\Chapter06\Starter\AdventureWorksETL\ folder.
  2. Create three new package variables using the following information; the Variables window can be opened by selecting Variables in the SSIS menu when an SSIS package is active in the control flow designer:

Determining the maximum number of worker threads in a data flow


Generally, multiple operations can be performed concurrently in SSIS, as long as sufficient resources are available in the environment hosting the execution. Parallelism can be achieved at several different levels, depending on the nature of the operations and the availability of resources.

Inside a data flow task, the data movements and transformations can be performed on one or more worker threads. Generally, the execution engine will always attempt to parallelize as many of the operations of a particular data flow as possible—in line with the nature of the transformations, and restricted by the available resources.

For instance, provided that enough worker threads are available for a particular transformation, and enough system memory can be allocated for the pipeline buffers, more than one instance of the same transformation can run concurrently. By setting the EngineThreads data flow property, you can restrict the number...

Using the master package concept


By using the master package concept, it is possible to parallelize the execution of multiple child packages as determined in the control flow of the master package. Typically, this approach is used to parallelize packages that use separate data sources and data destinations (for instance, processing separate dimension tables in data warehousing scenarios), or to parallelize packages with CPU-intensive operations to run concurrently with packages with I/O-intensive operations.

In the control flow, you use precedence constraints and containers to determine which operations can be performed in parallel, and which of them must be performed in sequence. Through the MaxConcurrentExecutables package property, you can determine the maximum number of tasks to be performed simultaneously.

For instance, in data warehousing scenarios, fact tables are processed after the associated dimension tables have been processed successfully. Prior to fact processing, most dimension...

Requesting an execution tree in SSDT


Every time the execution of an SSIS package is started, the SSIS execution engine first prepares the execution plan. This plan contains the package metadata used by the execution engine to determine the range of resources that are going to be needed to perform the operations defined by the package.

The Execution Tree of a given SSIS package, representing its execution plan, can be prepared on demand at design time. The functionality is available when the SSIS package is being edited in SSDT. The purpose of the Execution Tree is to provide you with the same information the execution engine uses to determine resource usage. By examining the Execution Trees, you can understand the expected behavior of the package in terms of resource usage, and become familiar with resource requirements before the SSIS package is deployed, or used, in the destination environment.

In this recipe, you are going to configure an SSIS package to capture two special events during...

Monitoring SSIS performance


When SSIS packages are executed, the execution engine emits a variety of events and messages that can be captured in the operating system hosting the execution. Typically, Windows Performance Monitor, an application of the Windows operating system, can be configured to capture the SSIS performance counters during SSIS executions.

The following performance counters are available in SQL Server 2016:

Name

Data type...

Establishing a performance monitor session


In this recipe, you are going to prepare an ad-hoc performance monitoring session by using the Performance Monitor, a component of the Windows operating system.

How to do it...

  1. In the Windows Start menu, locate the Performance Monitor and open it.
  2. In the tree on the left, expand the Monitoring Tools node, and select the Performance Monitor node.
  3. In the command ribbon on the right side of the window, click the Add icon, marked by the green plus sign, to open the Add Counters dialog.
  4. In the Available counters list, locate the SQL Server SSIS Pipeline 13.0 group, and expand it by clicking the tiny downward arrow to the right of the group name.
  5. With the help of the ctrl button on your keyboard, and the mouse, select the following counters:
    • Buffer memory
    • Buffers in use
    • Buffers spooled
    • Flat buffer memory
    • Flat buffers in use
    • Private buffer memory
    • Private buffers in use
  1. Click Add >> below the list. The counters should now be listed in the Added counters list on...

Configuring a performance monitor data collector set


In this recipe, you are going to prepare a Performance Monitor data collector set using the same settings that you used in the previous recipe. This time the performance monitoring operation is going to run in the background, and you will be able to inspect the performance data after it completes.

How to do it...

  1. Open the Performance Monitor.
  2. In the tree on the left side of the window, navigate to the Performance Monitor node, as described in the previous recipe.
  3. When the graph is visible, right-click the Performance Monitor node, expand the New sub-menu in the shortcut menu, and select Data Collector Set, as shown here:

  1. The Data Collector Set wizard starts. On the first page, enter Chapter06 in the Name text box, and then click Next.
  2. On page two, leave the default value of the Root directory property unchanged, and click Next.
  3. On page three, leave the <Default> value of the Run as property unchanged, and make sure that the Save and close...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
SQL Server 2017 Integration Services Cookbook
Published in: Jun 2017Publisher: PacktISBN-13: 9781786461827
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (6)

author image
Christian Cote

Christian Cote is an IT professional with more than 15 years of experience working in a data warehouse, Big Data, and business intelligence projects. Christian developed expertise in data warehousing and data lakes over the years and designed many ETL/BI processes using a range of tools on multiple platforms. He's been presenting at several conferences and code camps. He currently co-leads the SQL Server PASS chapter. He is also a Microsoft Data Platform Most Valuable Professional (MVP).
Read more about Christian Cote

author image
Dejan Sarka

Dejan Sarka, MCT and Microsoft Data Platform MVP, is an independent trainer and consultant who focuses on the development of database and business intelligence applications. He is the founder of the Slovenian SQL Server and .NET Users Group.
Read more about Dejan Sarka

author image
Matija Lah

Matija Lah has more than 18 years of experience working with Microsoft SQL Server, mostly from architecting data-centric solutions in the legal domain. His contributions to the SQL Server community have led to him being awarded the MVP Professional award (Data Platform) between 2007 and 2017/2018. He spends most of his time on projects involving advanced information management and natural language processing, but often finds time to speak at events related to Microsoft SQL Server where he loves to share his experience with the SQL Server platform.
Read more about Matija Lah

Performance counter

Description

BLOB bytes read

The number of bytes of binary large object (BLOB) data that the data flow engine has read from all sources.

BLOB bytes written

The number of bytes of BLOB data that the data flow engine has written to all destinations.

BLOB files in use

The number of BLOB files that the data flow engine currently is using for spooling.

Buffer memory

The amount of memory that is in use. This may include both physical and virtual memory. When this number is larger than the amount of physical memory, the Buffers...