Data Processing using Derived Column and Aggregate Data Transformations

Exclusive offer: get 80% off this eBook here
Beginners Guide to SQL Server Integration Services Using Visual Studio 2005

Beginners Guide to SQL Server Integration Services Using Visual Studio 2005 — Save 80%

An ideal book and eBook for trainers who want to teach an introductory course in SQL Server Integration Services or, to those who want to study and learn SSIS in a little over two weeks.

₨739.00    ₨147.80
by Jayaram Krishnaswamy | June 2008 | .NET Microsoft

SQL Server Integration Services, using its designer interface, or BIDS, has made it possible to process data using many of the data flow related items in its Toolbox without writing a line of code. This is a very attractive option for someone who is a beginner with some knowledge of databases, and for end users who may want to look up some detail buried in the data. This tutorial by Dr. Jayaram Krishnaswamy shows how the designed package extracts data from an SQL Server 2005 and processes it using several of its built-in transformations.

Details of Data Processing:

This package accomplishes the following:

  1. Connect to the SQL Server 2005 and access the MyNorthwind (a copy of the Northwind) database.
  2. A Data Reader Source will extract data using the following query:
    Select orderID, quantity, UnitPrice from [Order details]
  3. A Derived Column data transformation will add a column to the data flow which contains a combination of data from two other columns.
  4. A Conditional Split transformation divides the data flow into two flows based on a certain value in the derived column. The rows that satisfies the condition are captured in a recordset destination. The rest of the flow (bad data) gets routed to the next processor, the Aggregate Transformation.
  5. The Aggregate transformation orders the data coming into it showing group averages of the derived column.
  6. This grouped data then gets into a recordset destination.

Although the final data ends up in recordset destinations, they can also be persisted to other destinations such as Data Reader, OLEDB and File system.

Implementation in the Visual Studio 2005 Designer

Before going into the details of the data processing let us take a look at the building blocks from the VS 2005 Toolbox that can be used to implement it and the interconnections between the building blocks. In addition to the data flow controls three data viewers are also inserted to stop and inspect the data before it goes any further.

Data Processing using Derived Column and Aggregate Data Transformations

Data Extraction

The data is extracted from MyNorthwind database on the SQL Server 2005. While configuring the Data Reader, you make use of a Connection Manager. The details of this Connection Manager is shown in its editor as in the next figure. The connection uses the SqlClient Data Provider. The SQL Server is a default installation with the name [.] and as it is set for SQL Server authentication username and password are required. The database connection is chosen from the drop-down as shown. The connectivity test can be performed and it now allows the data reader to extract data from the database.

Data Processing using Derived Column and Aggregate Data Transformations

The Data Reader is added to the design surface by double clicking the control in the Toolbox. The Data Reader uses the connection manager to access the SQL Server as you see in the next figure in its Connection Manager's tab.

Data Processing using Derived Column and Aggregate Data Transformations

The Data Reader's Property shows some of the other configuration details made by the other tabs in the above editor. In particular it shows the query used in extracting the data. It extracts three columns from the Order Details table in the MyNorthwind database and provides this to the next component in the data processing implementation.

Data Processing using Derived Column and Aggregate Data Transformations

 


 

Beginners Guide to SQL Server Integration Services Using Visual Studio 2005 An ideal book and eBook for trainers who want to teach an introductory course in SQL Server Integration Services or, to those who want to study and learn SSIS in a little over two weeks.
Published: December 2007
eBook Price: ₨739.00
Book Price: ₨1,232.00
See more
Select your format and quantity:

Derived Column Transformation

Once the Data Reader is configured, you can double click the Derived Column Data Flow Transformation in the Toolbox to add it to the designer. Since the data from the Data Reader is used, all that is needed is to drag the green dangling line from data reader to the Derived Column component. This implies that the output from the Data Reader is used as the input to the Derived Column. Of course there is a more rigorous way of configuring this "Path" but the method indicated will suffice for now. The dangling line becomes a permanent join now and the path has important information as well as provides an interface called a 'Data Viewer" to monitor what is passing through.

Let us take a look at this "Path" between the Data reader and the Derived Column shown in the next figure. It clearly shows the several properties of the data flowing out of the DataReader source; Data type, Precision, Scale etc. By clicking on the Data Viewers list element in the left you open up its window on the right and a add a data viewer. A Data Viewer is a hold off switch for the data. In this case it is represented by a 'Grid' at run time and will show the three columns and a switch control as we will see later.

Data Processing using Derived Column and Aggregate Data Transformations

The next figure, Derived Column Transformation Editor shows how the Derived Column is configured. In the left you see the Variables [both System and User] and the columns [the three coming from Data Reader]. You need to give a name to the Derived Column, here the default is taken. The Derived Column can be a new column (as in this tutorial) or it may replace any of the three columns as shown. The Expression is where the designer needs to insert the logic. You can drag quantity from the top pane and drop it here. Then click on the Operators node in the right and bring the multiplication operator to the right followed by dragging UnitPrice from the left. This is all that is needed to set up this expression quantity*UnitPrice >=120.

Data Processing using Derived Column and Aggregate Data Transformations

The next processing is carried out by the Conditional Split Data Flow. The process of adding this is exactly as in the previous case; double click the Conditional Split Data Flow Transformation in the toolbox. Now drag the green line from the Derived Column to the Conditional split. Now take a look at the "Meta Data" coming from Derived Column by right clicking the "Path" line and picking up the Edit and reviewing Meta data as before. You will see an additional column being added, the Derived Column1 with its data type-Currency. An abbreviated listing shown here.

 

Name

Data Type

Source Component

orderID

DT_I4'

Data Reader Source

quantity

DT_I2

Data Reader Source

UnitPrice

DT_NUMERIC

Data Reader Source

Derived Column 1

DT_CY'

Derived Column

Beginners Guide to SQL Server Integration Services Using Visual Studio 2005 An ideal book and eBook for trainers who want to teach an introductory course in SQL Server Integration Services or, to those who want to study and learn SSIS in a little over two weeks.
Published: December 2007
eBook Price: ₨739.00
Book Price: ₨1,232.00
See more
Select your format and quantity:

Conditional Split Data Flow Transformation

This next figure shows how this component is configured. The condition for splitting the data flow is to set up the condition that when the value of a row in the data coming from Derived column is greater than or equal to 120 the data in that row goes to Case1 and those do not satisfy goes to 'bad-data'. You can of course set up multiple splits. The default output is given a name, 'bad_data'.

Data Processing using Derived Column and Aggregate Data Transformations

The data flow ''Case1" is diverted to a Recordset Destination which you can add from the Toolbox. This an "In-memory'' ADO recordset and all it needs is a reference to an Object variable which you can set up in Visual Studio. Of course you need to establish the flow by extending the green "Path" line from the Conditional Split to the Recordset Destination as shown in the implementation. Similar to in the previous case a data Viewer has been added to the "Path" from Conditional Split Data Transformation to the Aggregate Transformation.

Aggregate Data Transformation

The data arriving at this control consists of rows which has multiple rows with same OrderID and they can be grouped together and get an idea of what the group average which is the same as executing the following query on the Order Details table.

Select OrderID, Avg(quantity*UnitPrice) as Avrg
from [Order details]
where quantity*UnitPrice<120
Group by OrderID

The above query is the basis for properly setting up the Aggregate Data Transformation as shown in the next figure.

Data Processing using Derived Column and Aggregate Data Transformations

Finally the data is diverted to another Recordset Destination which will require setting up another Object variable in the scope of the data Flow. Again another Data Viewer is added to monitor the data flow at run time as shown.

Building and Executing the Package

The design of the package is completed. From the main menu you may build the project and execute the package by right clicking the Package file in the Solution Explorer and choose to Execute. The Package runs and you will notice that the first data viewer from the Data Reader loads up 2155 rows out of which the Conditional Split picks 1732 rows that satisfy the split condition. The 432 rows that did not satisfy the split condition are grouped into 326 distinct rows with their group averages.

Summary

This article considered a package that reveals group averages of some orders in the Order Details table in the Northwind Database (MyNorthwind) whose total (quantity*price) for any row did not amount to a specified value using some of the built in data transformations in the Visual Studio 2005 editor.

About the Author :


Jayaram Krishnaswamy

Jayaram Krishnaswamy studied at the Indian Institute of Science in Bangalore India and Madras University in India and taught at the Indian Institute of Technology in Madras. He went to Japan on a Japanese Ministry of Education Research scholarship to complete his PhD in Electrical Engineering from Nagoya University. He was a Post-Doctoral Fellow at Sydney University in Australia; a Government of India Senior Scientific Officer at the Indian Institute of Science in Bangalore and Indian Institute of Technology at Kanpur; a Visiting Scientist at the Eindhoven Institute of Technology in the Netherlands; a visiting Professor of Physics at the Federal University in Brazil; an Associate Research Scientist at a government laboratory in São Jose dos Campos in São Paulo, Brazil; a visiting scientist at the National Research Council in Ottawa, Canada before coming to USA in 1985. He has also taught and worked at the Colorado State University in Fort Collins and North Carolina State University in Raleigh, North Carolina. He worked with Northrop Grumman Corporation on a number of projects related to high energy electron accelerators and Free Electron Lasers. These projects were undertaken at the Brookhaven National Laboratory in Long Island and in the Physics Department at Princeton University. He has over 80 publications in refereed and non-refereed publications and 8 issued patents. He is fluent in Japanese and Portuguese and lives in Honolulu, Hawaii, USA.

He has been working in IT-related fields since 1997. He was once a Microsoft Certified Trainer in Networking and a Siebel Certified developer. He has worked with several IT related companies, such as the Butler International in their Siebel practice, with several IBM sub-contractors and smaller companies. Presently he is active in writing technical articles in the IT field to many online sites such as http://CodeProject.com, http://APSFree.com, http://DevShed.com, http://DevArticles.com, http://OfficeUsers.org, http://ASPAlliance.com, Egghead Café, http://SSWUG.org, Packt Article Network, http://databasedev.co.uk, http://cimaware.com, and many others. Between 2006 and 2010 he wrote more than 400 articles mostly related to database and web related technologies covering Microsoft, Oracle, Sybase, ColdFusion, Sun, and other vendor products.

He has written four books all published by Packt related to Microsoft Database and Application Development: SQL Server Integration Services Using Visual Studio 2005, Learning SQL Server Reporting Services 2008, Microsoft SQL Azure; Enterprise Application Development, and Microsoft Visual Studio Lightswitch Business Application Development. He regularly writes for his four blogs on Blogger; http://Hodentek.blogspot.com, http://HodentekHelp.blogspot.com, http://HodentekMobile.blogspot.com, and http://HodentekMSSS.blogspot.com. He received the 2011 Microsoft Community Contributor award.

Contact Jayaram Krishnaswamy

Books From Packt

Beginners Guide to SQL Server Integration Services Using Visual Studio 2005
Beginners Guide to SQL Server Integration Services Using Visual Studio 2005

ODP.NET Developer’s Guide: Oracle Database 10g Development with Visual Studio 2005 and the Oracle Data Provider for .NET
ODP.NET Developer’s Guide: Oracle Database 10g Development with Visual Studio 2005 and the Oracle Data Provider for .NET

Microsoft AJAX Library Essentials: Client-side ASP.NET AJAX 1.0 Explained
Microsoft AJAX Library Essentials: Client-side ASP.NET AJAX 1.0 Explained

ASP.NET Data Presentation Controls Essentials
ASP.NET Data Presentation Controls Essentials

Programming Windows Workflow Foundation: Practical WF Techniques and Examples using XAML and C#
Programming Windows Workflow Foundation: Practical WF Techniques and Examples using XAML and C#

Visual SourceSafe 2005 Software Configuration Management in Practice
Visual SourceSafe 2005 Software Configuration Management in Practice

LINQ Quickly
LINQ Quickly

BlackBerry Enterprise Server for Microsoft® Exchange
BlackBerry Enterprise Server for Microsoft® Exchange

 


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software