by Reza Rad and Pedro Perfeito
In this chapter, we will cover the following topics:
Import and Export Wizard: First experience with SQL Server Integration Services (SSIS)
Getting started with SSDT
Creating the first SSIS package
Getting familiar with Data Flow Task
SSIS 2012 versus previous versions in Developer Experience
As technology evolves, it is always necessary to integrate data between different systems. The integration component is increasingly gaining importance, especially the component responsible for data quality as well as the cleaning rules applied between source and destination databases. Different vendors have their own integration tools and components, and Microsoft with its SSIS tool is recognized as one of the leaders in this field.
SSIS can be used to perform a broad range of data integration tasks, and the most common scenarios are applied to Data Warehousing. The known term associated with Data Warehousing is the Extract Transform and Load (ETL) that is responsible for the extraction of data from several sources, their cleansing, customization, and loading into a central repository (for example, to a Data Warehouse, Data Mart, Hub, and so on). SSIS is also used in other scenarios, for example data migration and data consolidation. Data Migration is the one-time movement of data between databases and computer systems, and is needed when changes occur or when we upgrade our systems. Data Consolidation combines and integrates data from disparate systems and assumes high importance in a business environment with increasing acquisitions and mergers. The following diagram adapted from TDWI (www.tdwi.org) helps clarify the different scenarios where SSIS could be used:
New business challenges are driving organizations to adopt data integration projects. Some of these challenges are:
Increasing demand for real-time information reporting and analysis
Large volumes of data spread along the entire organization
The need to comply with regulations, which often require to continuously track all changes to data and not just the net result of those changes
Although SSIS is an amazing tool for data integration, the same work can be done manually in almost all cases. As you can imagine, performing data integration tasks manually could be hard to maintain in terms of code, hard to scale properly, and would require more time to implement. From our perspective, since we have SSIS, there is no real reason to do it manually. The cost of ownership is not a problem either, because SSIS is included with SQL Server licenses that most organizations have already acquired.
In this chapter you will learn how to work with SSIS, how to create packages for data transfer, and you'll perform some simple operations with SSIS Package. At the end, we will highlight several improvements which are included in this new version.
The Import and Export Wizard will be our first stop at SSIS. This wizard provides a simple ETL and is easy to use for basic data transfer operations. With this wizard you can choose a source, a destination, and map columns with few constraints on data transfer options. We will take a brief look at this wizard in our first experience with SSIS.
Install SQL Server 2012. SQL Server 2012 comes with three editions: Standard, Business Intelligence, and Enterprise. The Business Intelligence edition covers all requirements for this book that you'll need to install. With this edition you will have all SQL Server Integration Services features.
For many recipes in this book, you need to have the
AdventureWorksLT2012 sample databases installed. Information about installing these databases can be found in the book introduction.
To install sample databases, first download the database files from http://msftdbprodsamples.codeplex.com and then open SSMS to execute this statement (download
AdventureWorks2012 Data File and
"" CREATE DATABASE AdventureWorks2012 ON (FILENAME = '<drive>:\<file path>\AdventureWorks2012_Data.mdf') FOR ATTACH_REBUILD_LOG ;"" CREATE DATABASE AdventureWorksLT2012 ON (FILENAME = '<drive>:\<file path>\AdventureWorksLT2012_Data.mdf') FOR ATTACH_REBUILD_LOG ;
Note that you should replace the path of the file here with the path of the file downloaded on your machine.
Create a new database in SQL Server. Open SQL Server Management Studio (SSMS) from Start Menu | Microsoft SQL Server 2012 | SQL Server Management Studio. In the SSMS, connect to local computer instance and create a new database. Name this database as
Note that you should run SSMS as administrator, to do this just right click on SQL Server Management Studio from path above and right click on it and choose Run as Administrator.
In the Run window, enter
Open the wizard from the following address: Start Menu | All Programs | Microsoft SQL Server 2012 | Import and Export Data.
In SSMS, right-click on any database and then under Tasks, select Import Data or Export Data.
At the first step in the Import and Export Wizard, a welcome page will appear. Click on Next to enter the Choose a Data Source step, in this step you should choose where the Data Source comes from. The Data Source can be any source; from an Oracle database or SQL Server to any other database, flat files (such as
.csv) or even Excel files, the range of source and destinations is based on data providers installed on the machine. For this sample, leave the Data Source option as its default option which is SQL Server Native Client 11.0.
We want to export two tables from the
AdventureWorks2012database to another database. Therefore, leave the Server name as (local) or a single dot ( . ) or if you have a named instance you could use
.\<Instance-Name>, and in the Authentication section leave the Authentication Type as Windows Authentication. This option will use your Windows account for connecting to the database, so obviously the Windows account should have read access to the underlying database.
In the Database drop-down box, select AdventureWorks2012 from the list. Then click on Next and go to the next step.
The next step is required to choose a Destination, therefore, provide the connection details of the data's destination (types of destinations can differ from databases to flat files). For this sample leave the destination as default value which is SQL Server Native Client 11.0.
Set the Server Name to (local) or dot ( . ) to connect default instance of current machine. Set the Authentication as Windows Authentication. Select R1.1 in the Database drop-down list . Then click on Next to follow the wizard's steps.
In the Specify Table Copy or Query step you can choose between selecting a table or view name for a source database or writing a query to fetch data from a source. For this example, choose the Copy data from one or more tables or views option .
Next, a list of tables and views from the source database will appear. For this example, select HumanResource.Department, Person.Address, and Production.Product. Then click on Next.
The Save and Run Package step is next; it provides the ability to save all the settings and configurations that you've set for an SSIS Package. We are going to save this package to see what the SSIS Package looks like. There are a lot of concepts and options associated with saving SSIS Packages which will be discussed in upcoming chapters, so don't worry about some terminologies here, all of them will be explored later. Check the Save SSIS Package option and click on the Next button .
Now a summary of all settings that you've done appears here; after reviewing the summary click on Finish.
After clicking on the Finish button , the Import and Export Wizard will show up and we can see all the messages generated during the package's execution. The number of rows copied are displayed, or any other information such as the number of rows transferred, validated, and any other actions.
Close the Wizard and open SSMS to check the destination database, you can see transferred tables there with data.
In this recipe, we created the first SSIS Package with the Import and Export Wizard, this simple scenario exports some tables from the
AdventureWorks database to an empty database. In the last few steps, we saved the whole data transfer scenario to an SSIS Package on a file system that we'll be able to open with SQL Server Data Tools (SSDT) in later recipes.
With the Import and Export Wizard you can import or export data from a source to a destination, this is the most simplistic ETL scenario. In the Select a Data Source step you perform the Extract part of ETL and fetch data from SQL server database (data source). The second step, which was the Destination Select, was configured during the Load part of ETL. Load indicates where data should be exported; we export data to a SQL Server database. In this first example scenario we don't have any specific Transform stage. We will see this part of the ETL later, during the Data Flow chapters of this book.
After matching the columns and metadata, data will be transferred and a summary of all logs will show what happened during execution.
We saved this package to the file system; we can also save packages to an SQL Server. The difference between the different storage options for SSIS Packages with their pros and cons will be explored later in Deployment chapters.
As you've seen so far, the Import and Export Wizard is a simple way to transfer data that covers our most basic requirements. But in real-world scenarios, you need some additional features, which we'll now discuss.
When you click on Edit Mappings, the Column Mappings window will appear. As you see, there are some options here for mapping Source and Destination columns.
When the destination table doesn't exist in the destination database, the Create Destination Table step will flag it. This means that the missing table will be created in the destination database; you can click on the Edit SQL button to see what the exact create table statement is; you can change the script as you want here.
When the destination table already exists in the destination database, the Delete Rows and Append Rows options in Destination Table will be selectable. You can select between deleting rows in a destination table before data transfer or appending new rows to existing records with these options.
The Drop and re-create destination table option
will be selectable when the destination table already exists. Another important option is Enable identity insert
, which should be checked if you load data into an
IDENTITY column. The last part of the Column Mappings window is the Mappings section, which shows Source and Destination columns
, as well as some additional column information. By default, all columns with the same name in Source and Destination will be mapped automatically. However, if the column names are different you should select columns by selecting the correct column name in the drop-down box. If you want to remove a column from data transfer you can simply choose the <ignore> option
In real-world scenarios, you need to configure transfer settings for all tables at once. Select multiple rows in the Select Source Tables or Views Wizard step by holding the Ctrl key and clicking on every row that you need, and then click on the Edit Mappings button. The Transfer Settings dialog box will open; all configurations that you set here will be applied to all selected tables.
In the Destination schema name you can choose a schema name from the destination database and use this schema for all selected tables. You can also type a schema name there; if that schema doesn't exist in the destination database it will be created and all tables will be created under this new schema. The Drop and recreate new destination tables option will be applied to all new tables, and the Delete rows in existing destination tables option will be applied to all existing tables. Enable identity insert is also applicable to all tables that have Identity columns.
Data types which are automatically mapped through the Column Mappings window of the Import and Export Wizard are defined in XML files based on source and destination type. A list of all these XML files is available here:
<system drive>:\Program Files\Microsoft SQL Server\110\DTS\MappingFiles
If you need the ability to provide a custom query to read data from the source table(s), you can choose to write a query in order to specify the data to be transferred, and in the next step write the query to fulfill your requirements. You can also open a query from a file.
This version is based on Visual Studio 2010, and the whole structure that supports the process of developing such projects has been significantly improved. Working with SSDT is not only easier for advanced users who require more flexibility, but also for beginners who can enjoy some new and interesting user interfaces to help them take their first steps with SSDT. Previous versions of SSIS used Business Intelligence Development Studio (BIDS) as their development environment.
Open SQL Server Data Tools (SSDT) through the shortcut placed under Microsoft SQL Server 2012 or Open Microsoft Visual Studio 2010 under the Microsoft Visual Studio 2010 Start menu folders.
Once SSDT is open, a start page will be seen by default. The Start Page window contains useful information about the SSDT environment such as recently opened projects, links to create or open an existing project, and is also a useful area with several resources and the latest news to help stay up to date about several Microsoft platforms such as Windows, Web, Cloud, and so on.
Now that SSDT is already opened, let's create a new SSIS project from the Start Page window in order to understand the basic steps as well as the remaining windows placed in the SSIS project example.
Name the project as
R02_Getting Started with SSDT. Name the solution as
Ch01_Getting Start with SQL Server Integration Servicesin
C:\SSISand click on OK. An empty SSIS project will be created using the Project Deployment Model approach (default) with an empty package included.
In the Add Copy of Existing Package dialog box, set Package location to File System and choose the package path from the file that you saved in the previous recipe from this address:
The new package will be added under the SSIS
Packagesfolder, double-click on the package name in Solution Explorer to open it in Package Designer.
Double-click on Preparation SQL Task 1 and the Execute SQL Task Editor dialog will open. Verify the SQL Statement property with a click on the ellipsis button in front of SQL Statement, and then close the editor.
The next recipe will explain the process of creating a new SSIS Package in more detail, and for that reason this recipe will focus on how we could get more value from SSDT to make the development and maintenance easier and faster.
Now that the SSDT is open with an empty package, let's describe some of the windows that you should be familiar with, as shown in the next screenshot:
Control Flow is the most important tab; it's where a developer "explains" to SSIS what the package will do. The remaining tabs such as Data Flow (see recipe), Parameters (see Chapter 11, Event Handling and Logging), Event Handlers (see Chapter 10, Debugging, Troubleshooting, and Migrating Packages to 2012), the Package Explorer and Progress bar (available just at runtime) are also important and will be described in later recipes.
The Solution Explorer section contains projects and their files.
Each project consists of Project Parameters, Connection Managers, SSIS Packages, and the Miscellaneous folder.
Project Parameters are parameters which are public for all packages in the project. We will discuss parameters in later chapters.
The Connection Managers folder in the Solution Explorer consists of shared connection managers which are shared between all packages in a project.
All SSIS Packages will be listed under the SSIS Packages folder.
The Miscellaneous folder can consist of any other files that are relevant to projects and packages, files such as documentation files, screenshots, and so on.
Properties panel ( 3 )
In this panel, it's possible to read and edit the properties of each selected object in the Design area or Solution Explorer.
SSIS Toolbox ( 4 )
In the SSIS Package, there are tasks and components which will be available depending on the tab selected in the Package design area. When the tab selected is Control Flow, the SSIS Toolbox will be grouped into four areas. The groups of tasks are the Favorites, Common, Containers, and Other Tasks. With these tasks, it's possible to control and inform SSIS about what should be done during execution. An interesting tip is that you can add tasks to the Favorites area anytime you like by right-clicking on each task and selecting Move to Favorites.
Note that the SSIS Toolbox is completely different on the Data Flow tab; we will talk about it in later recipes.
Connection Managers ( 5 )
Connection Managers are connections from the SSIS Package's components to source or destination data providers. There are different types of connection managers, some of them which are much in use are OLE DB Connection manager, Flat File Connection Manager and so on.
Each connection manager can be used in one or more components in the SSIS Package to work with underlying data provider. Some data providers require the installation of special drivers to have connections to their data source.
Each connection manager which is relevant to the current package will be listed in the Connection Manager's pane. Some connections are bold, these are referenced from a shared project's connection manager.
We will discuss more about connections in the next recipes.
Variables Pane ( 6 )
Each task in SSIS Package can send information to other tasks and it is possible by resorting to Variables. Package variables, their data types, their scope, and other properties exist in this pane, which will be described in greater detail in later chapters.
After understanding the SSDT environment, you will be able to make your first SSIS Package. Depending on the Data Integration project's complexity, it's always recommended to think carefully while selecting tasks and components to apply, as well as the order in which to execute them.
In this recipe, the first package created will read the number of records in an Adventure Works Microsoft Sample table and store it inside the SSIS Package.
You can reuse the recipe created in the previous section (adding a new package to it), or start from scratch as explained in the following steps:
Open SQL Server Data Tools (SSDT).
Click on New Project… and a Windows dialog will appear.
Click on the Business Intelligence Projects tab and under the Installed Templates section, select Integration Services Project.
Provide a name and location for the SSIS project and an empty structure as well as a package will be created.
In the Solution Explorer, select the empty package.dtsx and rename it to:
Now that the SSIS project is created, ensure that the empty package created by default is open and follow these steps:
Create an OLEDB Connection to the Adventure Works Microsoft sample database at the package level. (As already mentioned, if you create this connection in the Solution Explorer window, the connection will be created at the project level and will be automatically included inside all the existent packages).
In the Configure OLE DB Connection Manager Editor, select New... to create a new connection. If the connection already exists in the Data Connections list then select it here.
Drag-and-drop Execute SQL Task from SSIS Toolbox and place into the control flow design surface.
Double-click to edit Execute SQL Task or right-click and click on the Edit option.
Set the Result set to Single row.
Add the SQL Statement:
SELECT COUNT(*) AS NR_ROWS FROM SalesLT.Customerto get the number of records from the
Drag-and-drop Script Task from SSIS Toolbox and place into the Package Designer area and edit it by double-clicking on it.
Create a new variable to store the value provided by the previous task and add a message box inside the script (the script task is explained more clearly in Chapter 11, Event Handling and Logging) as follows:
MsgBox (Dts.Variables(0).Value, MsgBoxStyle.Information)
Run the package by pressing F5.
This recipe created a basic SSIS Package that included the most used tasks among SSIS projects: the Execute SQL Task, to communicate with a database through SQL queries and the Script Task that allows us to create some custom "work" when other tasks fall short. Naturally, it is too early to go into details with these two tasks, but this way it's possible to have an initial look at these common tasks.
While the Control Flow tab under the Package Designer is where the main workflow of the package is manipulated, the Data Flow tab introduced in this recipe is the place where data is transformed and moved between a source and destination. The Data Flow tab under the Package Designer is used to create or edit such transformations and movements for each Data Flow task which is placed in the Control Flow tab.
It's possible to have several Data Flows in the Control Flow, but the order of execution for those Data Flows should be planned carefully. To be familiar with the Data Flow task, this recipe introduces a simple but very common scenario of incorporating the content of an Excel file into a database, a SQL Server database for example.
Open SQL Server Data Tools (SSDT).
Create a new project and provide a name and location for it.
Open the empty
package.dtsxcreated by default and rename it to
Imagine a case where SSIS needs to periodically integrate data produced by an external system, and this data needs to be prepared to fit the destination requirements such as schema and data values exactly. To accomplish this task, we need to read data from an Excel worksheet, perform data conversions, verify whether data already exists at the destination, and finally insert into an SQL table at the destination.
Drag-and-drop a Data Flow task from the SSIS Toolbox to Control Flow.
Open Data Flow for editing by double-clicking under the task or just right-click and select Edit.
Make sure that the Data Flow tab is selected in the Package Designer. At this step Data Flow is naturally empty.
Because we need to read some data from an Excel file, simply drag-and-drop the Excel Source component (under the Data Sources group) from SSIS Toolbox and place it into the Package Designer area. Note that SSIS Toolbox has different content listed; it has components in spite of tasks that exist when the Control Flow is selected in the Package Designer.
Set the Excel file's path to
C:\SSIS\Ch01_Getting Start with SSIS\FILES\R04_NewCustomers.xlsxand click on OK.
In the Excel Source Editor, set the name of the Excel sheet to
Select the Columns tab and choose the columns Firstname and Lastname, which will be used along the data flow.
Click on OK to finish editing Excel Source.
Drag-and-drop Data Conversion from SSIS Toolbox to the Package Designer.
Select the column's FirstName and LastName, and change the Length for each to 50 characters.
Maintain all the properties with default values and create a connection to an SQL destination database (
In the list to select table or view, select the destination table 'SalesLT'.'Customer'.
Map the source converted columns Copy of FirstName and Copy of LastName to the destination SQL columns.
Click on OK to finish editing the lookup.
Drag-and-drop the OLE DB Destination component from SSIS Toolbox into the Package Designer in order to load data into SQL.
Link the output No Match Rows from Lookup to insert into destination only those records that don't yet exist at the destination.
Edit the destination component and set the SQL connection and also the destination table; maintain all the default values for the remaining controls.
Run the package by pressing F5.
This Data Flow reads some customer data (first name and last name) from an Excel file, applies some common transformations and inserts the data into an SQL table named
SalesLT.Customer. Some transformations are usually applied to make source data fit the destination's requirements. In this example, data is converted and we verified whether the current data in the incoming rows already exists at the destination (the purpose is to avoid data duplication).
Detailed descriptions about each source (Transformation and Destination) used in this example will be explored in later chapters. In this recipe, we just need an overview of Data Flow.
This book aims at the new version of SSIS, which is SSIS 2012. There are a bunch of changes in SSIS 2012, this recipe covers some of the differences between SSIS 2012 and previous versions. The SSIS 2012 changes aren't limited to SSDT and design changes alone, there are also many changes while interacting with outside packages and package deployment, as well as new tasks and transformations; all of which we will explore in different recipes of this book in appropriate case scenarios. For this recipe, differences in SSDT as compared to previous versions are highlighted.
As this recipe compares the two versions, having SSIS 2012 and SSIS 2008 installed can be useful. SSIS 2012 is required but 2008 is optional (will help in comparison).
Create a New SSIS Project.
The first difference you'll notice in SSIS 2012 is that the SSDT is Visual Studio 2010 with a lot of improvements in the Editor. Earlier versions of SSIS work with previous versions of Visual Studio. SSIS 2008 worked with Visual Studio 2008, and SSIS 2005 worked with Visual Studio 2005.
When a new Package is created, the SSIS Toolbox is shown in the SSDT. As you would notice in the SSIS Toolbox, there are some categories that differ from the previous SSIS 2008 version. The previous version had only three sections: Control Flow Items, Maintenance Plan Tasks, and General. In SSDT 2010, sections are organized as: Favorites, Common, Containers, and Other Tasks. Favorites contain tasks that can be moved in here by right-clicking on items and then clicking on Move to Favorites. Common consists of a list of tasks that is among the most useful tasks. Containers have their own section in this version of SSDT, and all other tasks are placed in the Other Tasks section.
A quick description of each task or container is available under the SSIS Toolbox, as you can see in the following screenshot:
There is a new tab in the SSDT 2012 Package Designer named Parameters; we will discuss this Parameters tab in later chapters.
Right-click on the blank Control Flow area and select Getting Started; you will see the new Getting Started pane on the right-hand side at the bottom of SSDT, which shows some help and links as seen in the following screenshot:
Drag-and-drop a Script Task from toolbox to control flow, and double-click on Script Task. In the Script Task Editor, in the Script Language property , you can select between Visual C# 2010 and Visual Basic 2010 both of which are under the .NET 4.0 Framework. We will explore all details about scripting SSIS in later chapters.
Go to the Data Flow tab, you will see this sentence: No Data flow task has been added to this package. Click here to add a new data flow task. Just click on the link and a new empty data flow will be created.
Notice that there are two new options Source and Destination in the SSIS Toolbox which can be found under the Favorites sections Source Assistant and Destination Assistant. With these assistants you can simply select a data source or destination as you want with a single assistant component.
We had already reviewed some changes to SSDT Design in the Control Flow and Data Flow tab areas. There are lots of other changes such as better annotations, grouping components in data flow, and many other improvements in the UI which we will explore with the many examples in this book in later recipes.
On the other hand, there are some major changes like new tasks and transformations, like Expression Task, CDC Control Task, Data Flow Components, and DQS Cleansing transformation which we will explore in appropriate chapters.
There are major changes in the deployment of packages and working with packages from outside and package execution, which will be explored in package deployment chapters later on.