Data Processing using Derived Column and Aggregate Data Transformations

An ideal book and eBook for trainers who want to teach an introductory course in SQL Server Integration Services or, to those who want to study and learn SSIS in a little over two weeks.

Details of Data Processing:

This package accomplishes the following:

  1. Connect to the SQL Server 2005 and access the MyNorthwind (a copy of the Northwind) database.
  2. A Data Reader Source will extract data using the following query:
    Select orderID, quantity, UnitPrice from [Order details]
  3. A Derived Column data transformation will add a column to the data flow which contains a combination of data from two other columns.
  4. A Conditional Split transformation divides the data flow into two flows based on a certain value in the derived column. The rows that satisfies the condition are captured in a recordset destination. The rest of the flow (bad data) gets routed to the next processor, the Aggregate Transformation.
  5. The Aggregate transformation orders the data coming into it showing group averages of the derived column.
  6. This grouped data then gets into a recordset destination.

Although the final data ends up in recordset destinations, they can also be persisted to other destinations such as Data Reader, OLEDB and File system.

Implementation in the Visual Studio 2005 Designer

Before going into the details of the data processing let us take a look at the building blocks from the VS 2005 Toolbox that can be used to implement it and the interconnections between the building blocks. In addition to the data flow controls three data viewers are also inserted to stop and inspect the data before it goes any further.

Data Processing using Derived Column and Aggregate Data Transformations

Data Extraction

The data is extracted from MyNorthwind database on the SQL Server 2005. While configuring the Data Reader, you make use of a Connection Manager. The details of this Connection Manager is shown in its editor as in the next figure. The connection uses the SqlClient Data Provider. The SQL Server is a default installation with the name [.] and as it is set for SQL Server authentication username and password are required. The database connection is chosen from the drop-down as shown. The connectivity test can be performed and it now allows the data reader to extract data from the database.

Data Processing using Derived Column and Aggregate Data Transformations

The Data Reader is added to the design surface by double clicking the control in the Toolbox. The Data Reader uses the connection manager to access the SQL Server as you see in the next figure in its Connection Manager's tab.

Data Processing using Derived Column and Aggregate Data Transformations

The Data Reader's Property shows some of the other configuration details made by the other tabs in the above editor. In particular it shows the query used in extracting the data. It extracts three columns from the Order Details table in the MyNorthwind database and provides this to the next component in the data processing implementation.

Data Processing using Derived Column and Aggregate Data Transformations

 


 

Derived Column Transformation

Once the Data Reader is configured, you can double click the Derived Column Data Flow Transformation in the Toolbox to add it to the designer. Since the data from the Data Reader is used, all that is needed is to drag the green dangling line from data reader to the Derived Column component. This implies that the output from the Data Reader is used as the input to the Derived Column. Of course there is a more rigorous way of configuring this "Path" but the method indicated will suffice for now. The dangling line becomes a permanent join now and the path has important information as well as provides an interface called a 'Data Viewer" to monitor what is passing through.

Let us take a look at this "Path" between the Data reader and the Derived Column shown in the next figure. It clearly shows the several properties of the data flowing out of the DataReader source; Data type, Precision, Scale etc. By clicking on the Data Viewers list element in the left you open up its window on the right and a add a data viewer. A Data Viewer is a hold off switch for the data. In this case it is represented by a 'Grid' at run time and will show the three columns and a switch control as we will see later.

Data Processing using Derived Column and Aggregate Data Transformations

The next figure, Derived Column Transformation Editor shows how the Derived Column is configured. In the left you see the Variables [both System and User] and the columns [the three coming from Data Reader]. You need to give a name to the Derived Column, here the default is taken. The Derived Column can be a new column (as in this tutorial) or it may replace any of the three columns as shown. The Expression is where the designer needs to insert the logic. You can drag quantity from the top pane and drop it here. Then click on the Operators node in the right and bring the multiplication operator to the right followed by dragging UnitPrice from the left. This is all that is needed to set up this expression quantity*UnitPrice >=120.

Data Processing using Derived Column and Aggregate Data Transformations

The next processing is carried out by the Conditional Split Data Flow. The process of adding this is exactly as in the previous case; double click the Conditional Split Data Flow Transformation in the toolbox. Now drag the green line from the Derived Column to the Conditional split. Now take a look at the "Meta Data" coming from Derived Column by right clicking the "Path" line and picking up the Edit and reviewing Meta data as before. You will see an additional column being added, the Derived Column1 with its data type-Currency. An abbreviated listing shown here.

 

Name

Data Type

Source Component

orderID

DT_I4'

Data Reader Source

quantity

DT_I2

Data Reader Source

UnitPrice

DT_NUMERIC

Data Reader Source

Derived Column 1

DT_CY'

Derived Column

Conditional Split Data Flow Transformation

This next figure shows how this component is configured. The condition for splitting the data flow is to set up the condition that when the value of a row in the data coming from Derived column is greater than or equal to 120 the data in that row goes to Case1 and those do not satisfy goes to 'bad-data'. You can of course set up multiple splits. The default output is given a name, 'bad_data'.

Data Processing using Derived Column and Aggregate Data Transformations

The data flow ''Case1" is diverted to a Recordset Destination which you can add from the Toolbox. This an "In-memory'' ADO recordset and all it needs is a reference to an Object variable which you can set up in Visual Studio. Of course you need to establish the flow by extending the green "Path" line from the Conditional Split to the Recordset Destination as shown in the implementation. Similar to in the previous case a data Viewer has been added to the "Path" from Conditional Split Data Transformation to the Aggregate Transformation.

Aggregate Data Transformation

The data arriving at this control consists of rows which has multiple rows with same OrderID and they can be grouped together and get an idea of what the group average which is the same as executing the following query on the Order Details table.

Select OrderID, Avg(quantity*UnitPrice) as Avrg
from [Order details]
where quantity*UnitPrice<120
Group by OrderID

The above query is the basis for properly setting up the Aggregate Data Transformation as shown in the next figure.

Data Processing using Derived Column and Aggregate Data Transformations

Finally the data is diverted to another Recordset Destination which will require setting up another Object variable in the scope of the data Flow. Again another Data Viewer is added to monitor the data flow at run time as shown.

Building and Executing the Package

The design of the package is completed. From the main menu you may build the project and execute the package by right clicking the Package file in the Solution Explorer and choose to Execute. The Package runs and you will notice that the first data viewer from the Data Reader loads up 2155 rows out of which the Conditional Split picks 1732 rows that satisfy the split condition. The 432 rows that did not satisfy the split condition are grouped into 326 distinct rows with their group averages.

Summary

This article considered a package that reveals group averages of some orders in the Order Details table in the Northwind Database (MyNorthwind) whose total (quantity*price) for any row did not amount to a specified value using some of the built in data transformations in the Visual Studio 2005 editor.

Books to Consider

comments powered by Disqus
X

An Introduction to 3D Printing

Explore the future of manufacturing and design  - read our guide to 3d printing for free