Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
QlikView 11 for Developers

You're reading from  QlikView 11 for Developers

Product type Book
Published in Nov 2012
Publisher Packt
ISBN-13 9781849686068
Pages 534 pages
Edition 1st Edition
Languages

Table of Contents (23) Chapters

QlikView 11 for Developers
Credits
Foreword
About the Authors
Acknowledgements
About the Reviewers
www.PacktPub.com
Preface
Meet QlikView Seeing is Believing Data Sources Data Modeling Styling Up Building Dashboards Scripting Data Modeling Best Practices Basic Data Transformation Advanced Expressions Set Analysis and Point In Time Reporting Advanced Data Transformation More on Visual Design and User Experience Security Index

Chapter 12. Advanced Data Transformation

In this chapter we will dive into advanced transformation functions and techniques available through QlikView's extraction engine. This will allow you, as a developer, to finely process the source data and turn it into a clean design, while at the same time keeping an efficient script.

The goals of this chapter are:

  • To provide an overview of the most commonly used data architectures that can ease QlikView's development and administration

  • To describe the available functions for data aggregation

  • To learn how to take advantage of some of QlikView's most powerful data transformation functions.

Data architecture


Now that we have a decent amount of QlikView development experience under our belt, we will introduce the concept of data architecture. This refers to the process of structuring the different layers of data processing that exist between the source tables and the final document(s). Having a well-designed data architecture will greatly simplify the administration of the QlikView deployment. It also makes the QlikView solution scalable when new applications need to be developed and when the QlikView environment grows. There can be a lot of different data architectures, but in this section we will discuss two of the most commonly used in QlikView enterprise deployments.

Two-stage architecture

The following diagram depicts the two-stage architecture:

The two-stage architecture is composed of the following layers:

  • Source Layer: composed of the source databases and original tables.

  • Extract Layer: composed of QlikView documents, containing mainly script. These are used to pull the...

Loading data already stored in QlikView


The first lesson in advanced data transformation will be about optimizing loads when processing data. If you remember from Chapter 3, Data Sources, we discussed the various ways in which we can pull data from different sources into QlikView. We also described how we can take advantage of the QVD file format to store and read data in super-fast mode. Now, we will describe yet another way of reading source tables, but this time the "source" will be QlikView itself. There are different cases in which this approach will prove useful and we will describe two scenarios to perform it:

  • Accessing data already stored in a QlikView data model (QVW file) from a separate QlikView document. We will call this approach Cloning a QlikView data model.

  • Accessing data from the same QlikView document in which the data model resides. We will call this approach Loading from RAM.

Cloning a QlikView data model

This concept refers to the ability of replicating the data model of...

Aggregating data


While QlikView shines in dealing with massive data volumes, sometimes we just do not need to load everything at an atomic level. Data aggregation can, for example, be used in deployments where document segmentation by detail is needed, in which case two documents are created to serve different user groups and analysis needs: one document will have all data with the highest level of detail and another one will have a similar data model but with aggregated (reduced) tables. This way, users are better served by keeping a balance between performance and analysis needs.

In this section, we will implement a document segmentation scenario by aggregating the Flight Data table to create a second document intended for executive users, who only require summary data.

Aggregating the Flight Data table

When aggregating data, the first step is always to define which dimension fields will be left out and which ones will be kept in the summarized table. We should analyze this question by looking...

Sorting tables


We will now introduce the Order By statement, which is added to a Load statement and is used to sort an input table based on certain fields. There is one major condition for the Order By statement to work: it must be applied to a Load statement getting data from a Resident table, not from a table file or any other source.

Some databases can receive Order By instructions in the Select query, but in this section we will only deal with Order By statements on the QlikView side.

The Order By statement must receive at least one field name over which the ordering will be performed and, optionally, the sort order (either ascending or descending). If the sort order is not specified along with the field name, the default sort order will be applied, which is ascending.

An example script of an Order By statement at play is:

Load
 Region,
 Date,
 Amount
Resident SalesTable
Order By Date asc;

In this script, we are loading three fields (Region, Date, and Amount) from a previously loaded table...

The Peek function


Another tool we'll add to our collection in this set of data transformation techniques is the Peek function. The Peek function is an inter-record function that allows us to literally peek into previously-read records of a table and use its values to evaluate a condition or to affect the active record (the one being read).

The function takes one mandatory parameter, the field name into which we will "peek", and two optional parameters, a row reference and the table in which the field is located.

For example, an expression like:

Peek('Date', -2)

This expression will go back two records in the currently-being-read table, take the value on the Date field and use it as a result of the expression.

Or take this other expression:

Peek('Date', 2)

In this expression instead of "going back" two records, we will take the value in the Date field from the third record from the beginning of the current table (counting starts at zero).

We can also add a table name as the third parameter, as in...

Merging forces


On their own, the Order By statement and the Peek function are already powerful. Now, imagine what happens when we combine both of these tools to enhance our input data. In this section, we will use both of these functions to add a new calculated field to our Employment table (the one we integrated to our data model in Chapter 8, Data Modeling Best Practices).

A refresher

The Employment table provides information regarding the monthly number of employees per airline. The total number is split between part and full time employees, and it also shows the total FTEs (Full Time Equivalent).

The objective

The executives of HighCloud Airlines have asked the QlikView team to create a report that shows the monthly change in number of employees in a line chart to discover and analyze peaks in the employment behavior of each airline.

Getting it done

First, how do we find the total change in number of employees for this month compared to the last? Well, we take the number of employees in the...

Dealing with slowly changing dimensions


A slow changing dimension is one whose values vary across undefined time periods, that is, it can have different meanings depending on the time period context.

To illustrate the concept, consider the evolution of Joey, a support technician employee in a given company, over a certain period of time. When Joey joined the company, he had the Junior Support Technician position. Then, after one year, he gets promoted to Senior Technician. And now, one year later, has become the Support Manager.

Now, imagine you want to visualize the number of cases resolved by the entire support team over a three-year period and find out how many of those cases were resolved by junior technicians, how many were resolved by senior technicians, and how many were resolved by the support manager. If, for reporting purposes, we take Joey's current status in the company, all cases he has resolved in the last three years will be logged as if they were resolved by the Support Manager...

Ordering, peeking, and matching all at once


In the earlier sections, we have discussed three different functions commonly used in data transformation. We will now present a use case in which all three functions will complement each other to achieve a specific task.

The use case

We know that the IntervalMatch function makes use of closed intervals already defined in a table. What happens if all we have is a start date? To illustrate this scenario, look at the following screenshot:

As you can see, the End Date field has disappeared. However, there is a way for us to guess it and assign the corresponding value, based on the start date of the immediate following record. That is, if one record starts on 1-Feb-1998 and the immediate following starts on 1-Jan-2000, it means that the first interval ended on 31-Dec-1999, right?

In order for us to calculate the end date, we need to first sort the table values so that all corresponding records are contiguous, then "peek" at the start value from the next...

Incremental loads


Another important advantage of designing an appropriate data architecture, is the fact that it eases the construction and maintenance of incremental load scenarios, which are often required when dealing with large data volumes.

An incremental load is used to transfer data from one database to another efficiently and avoid the unnecessary use of resources. For instance, suppose we update our Base QVD Layer on a Monday morning, pulling all transactions from the source system and storing the table into a QVD file. The next morning, we need to update our Base QVD layer so that the final QlikView document contains the most recent data, including transactions generated in the source system during the previous day (after our last reload). In that case, we have two options:

  1. Extract the source table in its entirety.

  2. Extract only the new and/or modified transactions from the source table and append those records to the ones we previously saved in our Base QVDs.

The second option is what...

Summary


We've come to the end of an intense chapter. I hope you have followed the topics and, if not, I highly recommend to go back to read those sections which you found most difficult, so that you learn the concepts at full.

In this chapter, we have learned the importance of having a well-designed data architecture, how to load data from another QlikView document or previously loaded table in RAM, and also data aggregation functions and their uses.

We then learned how to order tables during load, how to calculate fields based on previously read records, how to deal with slowly changing dimensions to incorporate those tables into the associative data model, and finally the general process to perform an incremental load.

In the following chapter, we will continue exploring some front-end functionalities that can help us improve the user experience for our apps.

lock icon The rest of the chapter is locked
You have been reading a chapter from
QlikView 11 for Developers
Published in: Nov 2012 Publisher: Packt ISBN-13: 9781849686068
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}