Your message has been sent.
This article has been saved to your account.
Go to my account
This article has been emailed to your Kindle.
Send this article
Complete the form below to send this article, Data Processing using Derived Column and Aggregate Data Transformations, to a friend (or to yourself). We will never share your details (or your friend's) with anyone. For more information, read our Privacy Policy.
SQL Server Integration Services, using its designer interface, or BIDS, has made it possible to process data using many of the data flow related items in its Toolbox without writing a line of code. This is a very attractive option for someone who is a beginner with some knowledge of databases, and for end users who may want to look up some detail buried in the data. This tutorial by Dr. Jayaram Krishnaswamy shows how the designed package extracts data from an SQL Server 2005 and processes it using several of its built-in transformations.
Details of Data Processing:
This package accomplishes the following:
- Connect to the SQL Server 2005 and access the MyNorthwind (a copy of the Northwind) database.
- A Data Reader Source will extract data using the following query:
Select orderID, quantity, UnitPrice from [Order details]
- A Derived Column data transformation will add a column to the data flow which contains a combination of data from two other columns.
- A Conditional Split transformation divides the data flow into two flows based on a certain value in the derived column. The rows that satisfies the condition are captured in a recordset destination. The rest of the flow (bad data) gets routed to the next processor, the Aggregate Transformation.
- The Aggregate transformation orders the data coming into it showing group averages of the derived column.
- This grouped data then gets into a recordset destination.
Although the final data ends up in recordset destinations, they can also be persisted to other destinations such as Data Reader, OLEDB and File system.
Implementation in the Visual Studio 2005 Designer
Before going into the details of the data processing let us take a look at the building blocks from the VS 2005 Toolbox that can be used to implement it and the interconnections between the building blocks. In addition to the data flow controls three data viewers are also inserted to stop and inspect the data before it goes any further.

Data Extraction
The data is extracted from MyNorthwind database on the SQL Server 2005. While configuring the Data Reader, you make use of a Connection Manager. The details of this Connection Manager is shown in its editor as in the next figure. The connection uses the SqlClient Data Provider. The SQL Server is a default installation with the name [.] and as it is set for SQL Server authentication username and password are required. The database connection is chosen from the drop-down as shown. The connectivity test can be performed and it now allows the data reader to extract data from the database.

The Data Reader is added to the design surface by double clicking the control in the Toolbox. The Data Reader uses the connection manager to access the SQL Server as you see in the next figure in its Connection Manager's tab.

The Data Reader's Property shows some of the other configuration details made by the other tabs in the above editor. In particular it shows the query used in extracting the data. It extracts three columns from the Order Details table in the MyNorthwind database and provides this to the next component in the data processing implementation.

Derived Column Transformation
Once the Data Reader is configured, you can double click the Derived Column Data Flow Transformation in the Toolbox to add it to the designer. Since the data from the Data Reader is used, all that is needed is to drag the green dangling line from data reader to the Derived Column component. This implies that the output from the Data Reader is used as the input to the Derived Column. Of course there is a more rigorous way of configuring this "Path" but the method indicated will suffice for now. The dangling line becomes a permanent join now and the path has important information as well as provides an interface called a 'Data Viewer" to monitor what is passing through.
Let us take a look at this "Path" between the Data reader and the Derived Column shown in the next figure. It clearly shows the several properties of the data flowing out of the DataReader source; Data type, Precision, Scale etc. By clicking on the Data Viewers list element in the left you open up its window on the right and a add a data viewer. A Data Viewer is a hold off switch for the data. In this case it is represented by a 'Grid' at run time and will show the three columns and a switch control as we will see later.

The next figure, Derived Column Transformation Editor shows how the Derived Column is configured. In the left you see the Variables [both System and User] and the columns [the three coming from Data Reader]. You need to give a name to the Derived Column, here the default is taken. The Derived Column can be a new column (as in this tutorial) or it may replace any of the three columns as shown. The Expression is where the designer needs to insert the logic. You can drag quantity from the top pane and drop it here. Then click on the Operators node in the right and bring the multiplication operator to the right followed by dragging UnitPrice from the left. This is all that is needed to set up this expression quantity*UnitPrice >=120.

The next processing is carried out by the Conditional Split Data Flow. The process of adding this is exactly as in the previous case; double click the Conditional Split Data Flow Transformation in the toolbox. Now drag the green line from the Derived Column to the Conditional split. Now take a look at the "Meta Data" coming from Derived Column by right clicking the "Path" line and picking up the Edit and reviewing Meta data as before. You will see an additional column being added, the Derived Column1 with its data type-Currency. An abbreviated listing shown here.
|
Name |
Data Type |
Source Component |
|
orderID |
DT_I4' |
Data Reader Source |
|
quantity |
DT_I2 |
Data Reader Source |
|
UnitPrice |
DT_NUMERIC |
Data Reader Source |
|
Derived Column 1 |
DT_CY' |
Derived Column |
Conditional Split Data Flow Transformation
This next figure shows how this component is configured. The condition for splitting the data flow is to set up the condition that when the value of a row in the data coming from Derived column is greater than or equal to 120 the data in that row goes to Case1 and those do not satisfy goes to 'bad-data'. You can of course set up multiple splits. The default output is given a name, 'bad_data'.

The data flow ''Case1" is diverted to a Recordset Destination which you can add from the Toolbox. This an "In-memory'' ADO recordset and all it needs is a reference to an Object variable which you can set up in Visual Studio. Of course you need to establish the flow by extending the green "Path" line from the Conditional Split to the Recordset Destination as shown in the implementation. Similar to in the previous case a data Viewer has been added to the "Path" from Conditional Split Data Transformation to the Aggregate Transformation.
Aggregate Data Transformation
The data arriving at this control consists of rows which has multiple rows with same OrderID and they can be grouped together and get an idea of what the group average which is the same as executing the following query on the Order Details table.
Select OrderID, Avg(quantity*UnitPrice) as Avrg
from [Order details]
where quantity*UnitPrice<120
Group by OrderID
The above query is the basis for properly setting up the Aggregate Data Transformation as shown in the next figure.

Finally the data is diverted to another Recordset Destination which will require setting up another Object variable in the scope of the data Flow. Again another Data Viewer is added to monitor the data flow at run time as shown.
Building and Executing the Package
The design of the package is completed. From the main menu you may build the project and execute the package by right clicking the Package file in the Solution Explorer and choose to Execute. The Package runs and you will notice that the first data viewer from the Data Reader loads up 2155 rows out of which the Conditional Split picks 1732 rows that satisfy the split condition. The 432 rows that did not satisfy the split condition are grouped into 326 distinct rows with their group averages.
Summary
This article considered a package that reveals group averages of some orders in the Order Details table in the Northwind Database (MyNorthwind) whose total (quantity*price) for any row did not amount to a specified value using some of the built in data transformations in the Visual Studio 2005 editor.
About the Author :
Jayaram Krishnaswamy
Jayaram Krishnaswamy studied at the Indian Institute of Science in Bangalore India and Madras University in India and taught at the Indian Institute of Technology in Madras. He went to Japan on a Japanese Ministry of Education Research scholarship to complete his PhD in Electrical Engineering from Nagoya University. He was a Post-Doctoral Fellow at Sydney University in Australia; a Government of India Senior Scientific Officer at the Indian Institute of Science in Bangalore and Indian Institute of Technology at Kanpur; a Visiting Scientist at the Eindhoven Institute of Technology in the Netherlands; a visiting Professor of Physics at the Federal University in Brazil; an Associate Research Scientist at a government laboratory in São Jose dos Campos in São Paulo, Brazil; a visiting scientist at the National Research Council in Ottawa, Canada before coming to USA in 1985. He has also taught and worked at the Colorado State University in Fort Collins and North Carolina State University in Raleigh, North Carolina. He worked with Northrop Grumman Corporation on a number of projects related to high energy electron accelerators and Free Electron Lasers. These projects were undertaken at the Brookhaven National Laboratory in Long Island and in the Physics Department at Princeton University. He has over 80 publications in refereed and non-refereed publications and 8 issued patents. He is fluent in Japanese and Portuguese and lives in Honolulu, Hawaii, USA.
He has been working in IT-related fields since 1997. He was once a Microsoft Certified Trainer in Networking and a Siebel Certified developer. He has worked with several IT related companies, such as the Butler International in their Siebel practice, with several IBM sub-contractors and smaller companies. Presently he is active in writing technical articles in the IT field to many online sites such as http://CodeProject.com, http://APSFree.com, http://DevShed.com, http://DevArticles.com, http://OfficeUsers.org, http://ASPAlliance.com, Egghead Café, http://SSWUG.org, Packt Article Network, http://databasedev.co.uk, http://cimaware.com, and many others. Between 2006 and 2010 he wrote more than 400 articles mostly related to database and web related technologies covering Microsoft, Oracle, Sybase, ColdFusion, Sun, and other vendor products.
He has written four books all published by Packt related to Microsoft Database and Application Development: SQL Server Integration Services Using Visual Studio 2005, Learning SQL Server Reporting Services 2008, Microsoft SQL Azure; Enterprise Application Development, and Microsoft Visual Studio Lightswitch Business Application Development. He regularly writes for his four blogs on Blogger; http://Hodentek.blogspot.com, http://HodentekHelp.blogspot.com, http://HodentekMobile.blogspot.com, and http://HodentekMSSS.blogspot.com. He received the 2011 Microsoft Community Contributor award.
Books From Packt



Post new comment