Packt+ | Advance your knowledge in tech

You're reading from QlikView 11 for Developers

Product type Book

Published in Nov 2012

Publisher Packt

ISBN-13 9781849686068

Pages 534 pages

Edition 1st Edition

Languages

Concepts

Business Intelligence

Table of Contents (23) Chapters

QlikView 11 for Developers

Credits

Foreword

About the Authors

Acknowledgements

About the Reviewers

www.PacktPub.com

Preface

Meet QlikView

Seeing is Believing

Data Sources

Data Modeling

Styling Up

Building Dashboards

Scripting

Data Modeling Best Practices

Basic Data Transformation

Advanced Expressions

Set Analysis and Point In Time Reporting

Advanced Data Transformation

More on Visual Design and User Experience

Security

Index

Chapter 12. Advanced Data Transformation

In this chapter we will dive into advanced transformation functions and techniques available through QlikView's extraction engine. This will allow you, as a developer, to finely process the source data and turn it into a clean design, while at the same time keeping an efficient script.

The goals of this chapter are:

To provide an overview of the most commonly used data architectures that can ease QlikView's development and administration
To describe the available functions for data aggregation
To learn how to take advantage of some of QlikView's most powerful data transformation functions.

Data architecture

Now that we have a decent amount of QlikView development experience under our belt, we will introduce the concept of data architecture. This refers to the process of structuring the different layers of data processing that exist between the source tables and the final document(s). Having a well-designed data architecture will greatly simplify the administration of the QlikView deployment. It also makes the QlikView solution scalable when new applications need to be developed and when the QlikView environment grows. There can be a lot of different data architectures, but in this section we will discuss two of the most commonly used in QlikView enterprise deployments.

Two-stage architecture

The following diagram depicts the two-stage architecture:

The two-stage architecture is composed of the following layers:

Source Layer: composed of the source databases and original tables.
Extract Layer: composed of QlikView documents, containing mainly script. These are used to pull the...

Loading data already stored in QlikView

The first lesson in advanced data transformation will be about optimizing loads when processing data. If you remember from Chapter 3, Data Sources, we discussed the various ways in which we can pull data from different sources into QlikView. We also described how we can take advantage of the QVD file format to store and read data in super-fast mode. Now, we will describe yet another way of reading source tables, but this time the "source" will be QlikView itself. There are different cases in which this approach will prove useful and we will describe two scenarios to perform it:

Accessing data already stored in a QlikView data model (QVW file) from a separate QlikView document. We will call this approach Cloning a QlikView data model.
Accessing data from the same QlikView document in which the data model resides. We will call this approach Loading from RAM.

Cloning a QlikView data model

This concept refers to the ability of replicating the data model of...

Aggregating data

While QlikView shines in dealing with massive data volumes, sometimes we just do not need to load everything at an atomic level. Data aggregation can, for example, be used in deployments where document segmentation by detail is needed, in which case two documents are created to serve different user groups and analysis needs: one document will have all data with the highest level of detail and another one will have a similar data model but with aggregated (reduced) tables. This way, users are better served by keeping a balance between performance and analysis needs.

In this section, we will implement a document segmentation scenario by aggregating the Flight Data table to create a second document intended for executive users, who only require summary data.

Aggregating the Flight Data table

When aggregating data, the first step is always to define which dimension fields will be left out and which ones will be kept in the summarized table. We should analyze this question by looking...

Sorting tables

We will now introduce the Order By statement, which is added to a Load statement and is used to sort an input table based on certain fields. There is one major condition for the Order By statement to work: it must be applied to a Load statement getting data from a Resident table, not from a table file or any other source.

Some databases can receive Order By instructions in the Select query, but in this section we will only deal with Order By statements on the QlikView side.

The Order By statement must receive at least one field name over which the ordering will be performed and, optionally, the sort order (either ascending or descending). If the sort order is not specified along with the field name, the default sort order will be applied, which is ascending.

An example script of an Order By statement at play is:

Load
 Region,
 Date,
 Amount
Resident SalesTable
Order By Date asc;

In this script, we are loading three fields (Region, Date, and Amount) from a previously loaded table...

The Peek function

Another tool we'll add to our collection in this set of data transformation techniques is the Peek function. The Peek function is an inter-record function that allows us to literally peek into previously-read records of a table and use its values to evaluate a condition or to affect the active record (the one being read).

The function takes one mandatory parameter, the field name into which we will "peek", and two optional parameters, a row reference and the table in which the field is located.

For example, an expression like:

Peek('Date', -2)

This expression will go back two records in the currently-being-read table, take the value on the Date field and use it as a result of the expression.

Or take this other expression:

Peek('Date', 2)

In this expression instead of "going back" two records, we will take the value in the Date field from the third record from the beginning of the current table (counting starts at zero).

We can also add a table name as the third parameter, as in...

Merging forces

On their own, the Order By statement and the Peek function are already powerful. Now, imagine what happens when we combine both of these tools to enhance our input data. In this section, we will use both of these functions to add a new calculated field to our Employment table (the one we integrated to our data model in Chapter 8, Data Modeling Best Practices).

A refresher

The Employment table provides information regarding the monthly number of employees per airline. The total number is split between part and full time employees, and it also shows the total FTEs (Full Time Equivalent).

The objective

The executives of HighCloud Airlines have asked the QlikView team to create a report that shows the monthly change in number of employees in a line chart to discover and analyze peaks in the employment behavior of each airline.

Getting it done

First, how do we find the total change in number of employees for this month compared to the last? Well, we take the number of employees in the...

Dealing with slowly changing dimensions

A slow changing dimension is one whose values vary across undefined time periods, that is, it can have different meanings depending on the time period context.

To illustrate the concept, consider the evolution of Joey, a support technician employee in a given company, over a certain period of time. When Joey joined the company, he had the Junior Support Technician position. Then, after one year, he gets promoted to Senior Technician. And now, one year later, has become the Support Manager.

Now, imagine you want to visualize the number of cases resolved by the entire support team over a three-year period and find out how many of those cases were resolved by junior technicians, how many were resolved by senior technicians, and how many were resolved by the support manager. If, for reporting purposes, we take Joey's current status in the company, all cases he has resolved in the last three years will be logged as if they were resolved by the Support Manager...

Ordering, peeking, and matching all at once

In the earlier sections, we have discussed three different functions commonly used in data transformation. We will now present a use case in which all three functions will complement each other to achieve a specific task.

The use case

We know that the IntervalMatch function makes use of closed intervals already defined in a table. What happens if all we have is a start date? To illustrate this scenario, look at the following screenshot:

As you can see, the End Date field has disappeared. However, there is a way for us to guess it and assign the corresponding value, based on the start date of the immediate following record. That is, if one record starts on 1-Feb-1998 and the immediate following starts on 1-Jan-2000, it means that the first interval ended on 31-Dec-1999, right?

In order for us to calculate the end date, we need to first sort the table values so that all corresponding records are contiguous, then "peek" at the start value from the next...

Incremental loads

Another important advantage of designing an appropriate data architecture, is the fact that it eases the construction and maintenance of incremental load scenarios, which are often required when dealing with large data volumes.

An incremental load is used to transfer data from one database to another efficiently and avoid the unnecessary use of resources. For instance, suppose we update our Base QVD Layer on a Monday morning, pulling all transactions from the source system and storing the table into a QVD file. The next morning, we need to update our Base QVD layer so that the final QlikView document contains the most recent data, including transactions generated in the source system during the previous day (after our last reload). In that case, we have two options:

Extract the source table in its entirety.
Extract only the new and/or modified transactions from the source table and append those records to the ones we previously saved in our Base QVDs.

The second option is what...

Summary

We've come to the end of an intense chapter. I hope you have followed the topics and, if not, I highly recommend to go back to read those sections which you found most difficult, so that you learn the concepts at full.

In this chapter, we have learned the importance of having a well-designed data architecture, how to load data from another QlikView document or previously loaded table in RAM, and also data aggregation functions and their uses.

We then learned how to order tables during load, how to calculate fields based on previously read records, how to deal with slowly changing dimensions to incorporate those tables into the associative data model, and finally the general process to perform an incremental load.

In the following chapter, we will continue exploring some front-end functionalities that can help us improve the user experience for our apps.