We are anxious to start our data visualization project, but if we fail to understand the context within which we are working, we are more prone to make trivial, gaudy graphs. We want to craft great data visualization, and to do this, we first analyze the most important elements of its foundation: people, data, and tools.
We are data discovery experts who work for QDataViz, Inc. Our fictitious company is extremely successful at helping our customers get the most out of their data using our best practices in data visualization.
However, all is not well for QDataViz, Inc., and our CEO, Charles W. Smith, Jr. has invited us to a meeting to discuss a plan to help the company turn its losses into a profit. In the meeting, Charles remarks that while we are excellent advisors to our customers, we have failed to use our best practices internally to support the decisions of our colleagues.
He entrusts us data discovery experts to review our company's data, and more importantly, to empower our colleagues to perform the analysis necessary to improve the company's situation.
People are the only active element of data visualization, and as such, they are the most important. We briefly describe the roles of several people that participate in our project, but we mainly focus on the person who is going to analyze and visualize the data.
After the meeting, we get together with our colleague, Samantha, who is the analyst that supports the sales and executive teams. She currently manages a series of highly personalized Excels that she creates from standard reports generated within the customer invoice and project management system. Her audience ranges from the CEO down to sales managers. She is not a pushover, but she is open to try new techniques, especially given that the sponsor of this project is the CEO of QDataViz, Inc.
As a data discovery user, Samantha possesses the following traits:
She does not endanger her job and company results following every technological fad or whimsical idea. However, she realizes that technology does change and that a new approach can foment breakthroughs.
She loves finding anomalies in the data and being the reason that action is taken to improve QDataViz, Inc. As a means to achieve what she loves, she understands how to apply functions and methods to manipulate data.
Our next meeting involves Samantha and Ivan, our Information Technology (IT) Director. While Ivan explains the data available in the customer invoice and project management system's well-defined databases, Samantha adds that she has vital data in Microsoft Excel that is missing from those databases. One Excel file contains the sales budget and another contains an additional customer grouping; both files are necessary to present information to the CEO.
We take advantage of this discussion to highlight the following characteristics that make data easy to analyze.
Ivan is going to document the origin of the tables and fields, which increases Samantha's confidence in the data. He is also going to perform a basic data cleansing and eliminate duplicate records whose only difference is a period, two transposed letters, or an abbreviation.
Once the system is operational, Ivan will consider the impact any change in the customer invoice and project management system may have on the data. He will also verify that the data is continually updated while Samantha helps confirm the data's validity.
Ivan will preserve as much detail as possible. If he is unable to handle large volumes of data as a whole, he will segment the detailed data by month and reduce the detail of a year's data in a consistent fashion. Conversely, he is will consider adding detail by prorating payments between the products of paid invoices in order to maintain a consistent level of detail between invoices and payments.
An Excel file as a data source is a short-term solution. While Ivan respects its temporary use to allow for a quick, first release of the data visualization project, he takes responsibility to find a more stable medium to long-term solution. In the span of a few months, he will consider modifying the invoice system, investing in additional software, or creating a simple portal to upload Excel files to a database.
Ivan will not prevent progress solely for bureaucratic reasons. Samantha respects that Ivan's goal is to make data more standardized, secure, and recoverable. However, Ivan knows that if he does not move as quickly as business does, he will become irrelevant as Samantha and others create their own black market of company data.
Ivan is going to make available manifold perspectives of QDataViz, Inc. He will maintain history, budgets, and forecasts by customers, salespersons, divisions, states, and projects. Additionally, he will support segmenting these dimensions into multiple groups, subgroups, classes, and types.
We continue our meeting with Ivan and Samantha, but we now change our focus to what tool we will use to foster great data visualization and analysis. We create the following list of basic features we hope from this tool:
We should be able to learn the tool quickly and be able to deliver a first version of our data visualization project within a matter of weeks. In this fashion, we start receiving a return on our investment within a short period of time.
Ivan should be able to maintain hundreds or thousands of users and data volumes that exceed 100 million rows. He should also be able to restrict access to certain data to certain users. Finally, he needs to have the confidence that the tools will remain available even if a server fails.
Based on these expectations, we talk about data discovery tools, which are increasingly becoming part of the architecture of many organizations. Samantha can use these tools for self-service data analysis. In other words, she can create her own data visualizations without having to depend on pre-built graphs or reports. At the same time, Ivan can be reassured that the tool does not interfere with his goal of providing an enterprise solution that offers scalability, security, and high availability.
There are several data discovery tools on the market and each has its strengths and weaknesses; however, QlikView is arguably the most well-rounded data discovery tool. QlikView provides an all-in-one tool with the ability to extract and transform raw data, construct a data model, and create dynamic data visualizations. It deploys as a departmental or enterprise solution within small, medium, or large organizations.
QlikView's greatest product differentiation is what it calls its associative data model. QlikView's associative data model is a type of pervasive filtering. When we filter the values in any field, all the values in every other field, independent of the table it belongs to, will automatically be filtered according to their direct or indirect relationship with the values we filtered. Since this is an intensive process, all the data is compressed and stored in a computer's RAM memory.
Along with facilitating development and accelerating deployment, this feature aids our own analytical process. We can only think about one topic in detail for so long before asking questions about a related topic. In our QlikView application, Samantha will easily dig deep into our sales data and analyze it by salesperson and customer, and then, upon finding something interesting, change the perspective to look at sales and costs by project. Finally, she will end up comparing the consultants' real costs against estimated costs, first in particular projects, and then in all projects. Astoundingly, all this will happen with ease in the same data model within one QlikView application.
Once Samantha has concluded her robust analysis, she will use QlikView to present her diagnosis of QDataViz, Inc.'s problems and allow users to interact with her supporting analysis. For this reason, we consider QlikView our tool to develop ideas, test theories, and communicate our conclusions to a critical, participating audience.
Samantha is new to QlikView so we are going to show her how to install the software in her computer. First, we go to http://download.qlikview.com, register a user, and then download QlikView Desktop. In this project, we will be using version 11. We install QlikView following a very simple installation wizard, and request a named license from the IT department for this project. (If you don't have a named license, please follow the instructions included in the book's exercise files.)
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Now we are ready to open our first QlikView application.
We are going to start our business discovery process by first opening a QlikView application and then taking a look at the data model and its metadata. Finally, we will preview the data directly in the data model before creating our first QlikView objects.
Let's open the QlikView application.
- Click on Open in the standard toolbar.
Browse for the
Exercises\Original\folder and open
Once the QlikView application opens, we notice an empty sheet where Samantha is going to play with the data, creating and destroying objects as if it were an analyst's sandbox.
A QlikView application contains a snapshot of past data. Anytime we want to update the data, we click on Reload in the standard toolbar.
Ivan has created the invoices and project management data model based on the star schema. We prefer to use the star schema because it facilitates analysis. The numeric values of an event are stored in one central fact table while related descriptive data is grouped into surrounding dimension tables.
Metadata is data that describes data. Among other things, metadata includes descriptions of data volume, age, source, and usage. In our QlikView application, we are interested in information that explains what kind of data we can expect to find in our data model.
Metadata is not a necessary component of QlikView. However, since our goal is to empower business users, Ivan has included table and field descriptions in our data model. Alongside user-friendly table and field names, this accelerates our data discovery experience. We can see the metadata in the same Table Viewer window by hovering over any table or field as shown in the following screenshot:
Finally, we can go further and preview the actual data in the model by right-clicking on any table and selecting Preview. QlikView displays a 1000-row preview of the data. A cell that contains a dash (-) means that the value is null, or that data does not exist for that cell.
Listboxes are the easiest and most powerful way to perform an initial discovery of our data. Listboxes are lists of all the unique values in a field. Its behavior as we filter values can help us answer many questions about our data.
For example, Samantha's first task is to validate the quality of the data. She starts by investigating how well the invoices and projects are defined by division.
We find our answer by carrying out the following steps:
Right-click on the blank sheet and click on Select Fieldsâ¦
In the Show Fields from Table drop-down box, select Divisions. In the above list of fields, select No. Division and Division and then click on Add >.
In the Show Fields from Table drop-down box, select Customers. In the above list of fields, select No. Customer and Customer and then click on Add >.
In the Show Fields from Table drop-down box, select ProjectTasks. In the above list of fields, select Project Code and Project Name and then click on Add >.
In the Show Fields from Table drop-down box, select MasterCalendar. In the above list of fields, select Year and Month and then click on Add >.
In the Show Fields from Table drop-down box, select Facts. In the above list of fields, select Net Sales and then click on Add >.
Click on OK.
Among the available values in the listbox containing QDataViz, Inc.'s divisions, we notice the value N/A, which indicates that several transactions may exist that have not been assigned a division. Let's select N/A and analyze the results that are shown in the following screenshot:
Based on the selection made in the screenshot, we discover that during the year 2011, several transactions for the customer Extensive Enterprise were not assigned to any project. The transactions occurred during multiple months, but luckily the net sales amounts appears to be small.
We are able to come to this conclusion because QlikView listboxes use the color scheme shown in the following screenshot when filtering its values:
An alternative to a series of listboxes is a table box. Table boxes show us unique rows of related data from one or more tables.
The resulting table shown in the following screenshot displays each document that is associated with the division N/A:
We observe that several documents are not assigned to a proper division. Although Samantha will work with Ivan to clean up each transaction, we conclude that the unassigned amounts are relatively small and do not prevent us from continuing our analysis of QDataViz, Inc.'s data.
We repeat the above process a hundred times over as new questions arise about our data. In this way, we take advantage of QlikView's flexibility to easily and quickly discover data.
Before moving on to rank analysis, we save our QlikView application with the following step:
- Click on Save in the standard toolbar.
People, data, and tools are an essential part of creating great data visualization and analysis. We are going to provide Samantha with the power of self-service data discovery using QlikView over our customer invoice and project management data.
We briefly covered how to open a QlikView application and review its data model. Also, we learned the color scheme QlikView uses to filter data and how to create listboxes and table boxes to perform basic data discovery and test the quality of our data.
In the following chapter, we will learn how to use rank analysis to concentrate our efforts on finding problems that have the most effect on QDataViz, Inc.'s performance.