This chapter is designed as an introduction to tabular modeling by using PowerPivot. It shows the process by which a user imports data into PowerPivot for Excel, creates relationships between the datasets, and then reports on it.
The data used in this chapter is based on the orders of the fictitious bicycle company (named Adventure Works). Our data includes six datasets and they are:
Orders list: This shows what orders have been placed by customers. The list includes an entry for each product that has been ordered. This data simulates a detailed extract from an operational source system.
Getting the data into the model
Defining the relationships among tables
Defining calculations based on business logic
This chapter examines these steps and allows the reader to become familiar with the tabular (PowerPivot) design environment.
An Excel workbook can only contain one tabular model and that one model contains tables of data (which may or may not be related). The first step to create a model is to import data into it. There are many techniques to do this—some techniques have advantages over others but for now, let's only consider the fact that we want to load data that exists in an Excel worksheet into the model.
The installation instructions for PowerPivot in Excel 2010 are covered in the Appendix, Installing PowerPivot and Sample Databases, of this book.
Open the Excel workbook named
SalesBook which is available from the Packt Publishing website to examine the worksheets within the book. Each sheet contains a dataset for
This recipe looks at importing data into the PowerPivot model through linked tables. These are very convenient to use when the data is stored in Excel. Additionally, once the data has been imported into PowerPivot, it retains a connection to the Excel table. This means that, when the data is changed in Excel, it can also be changed in the PowerPivot model.
Let's start by importing the product list. Select the Product List sheet and select cell A1.
A small window will open confirming the data range with a checkbox for table headers. Select the checkbox and press OK.
The PowerPivot window will open and the data from the Product List sheet will be imported. Note that the table appears as a tab which is similar to Excel and is called Table1. Also, note that the PowerPivot window is a separate window than the Excel workbook, so that we can return to Excel.
Repeat this process for all the remaining datasets except
When a linked table is created in PowerPivot, Excel creates a named range in the Excel workbook. This is then linked to the PowerPivot model (note that there is a small chain symbol before each of the tables). Also, note that the tables in Excel are formatted with alternate blue coloring. The named ranges can be viewed in Excel by clicking on the Name Manager button on the Formulas tab.
A table (table range) is actually an Excel feature that PowerPivot utilizes. A table can be defined in Excel, given a meaningful name and then imported into PowerPivot, so that the name of the table in PowerPivot is the same as the named range in Excel.
Ensure that the
Customers sheet is selected in Excel and also any cell in the
Customers data is selected. In the Home tab, click on the Format as Table button, and choose a table style; the style chosen in the following screenshot is a relevant one:
Note that the data is now formatted with alternating colors (based on the selected style). Return to the Name Manager window and double-click the table that relates to the
Customers worksheet. A new window will open allowing you to edit the name, replace the name
Customers, and click on OK. The
Table6 name is replaced by
Customers in the Name Manager window.
Now, create a linked table in the same manner as we did before and note that the name of the table imported into PowerPivot is
If you want to select an entire table in Excel, simply choose the table name from the Name Box drop-down list in the formula bar in the upper-left corner. This is shown in the following screenshot:
A PowerPivot workbook contains two products that allow the user to analyze data. Firstly, there is the xVelocity in-memory analytics engine (the tabular model) which is a columnar database embedded in the workbook. Secondly, there is a client tool that allows the model to be queried, it also displays the results to the user in the form of a pivot table or pivot chart. In Excel 2010, the client tool was restricted to pivot table functionality (for example, a pivot table or pivot chart). In Excel 2013, the tools set has been extended to include Power View. The important distinction here is that the client tool is used to present the model to the user. This recipe shows how to control the way the model is presented to the user.
This recipe uses the model that has already been created in the prior recipe Creating the model. If this model has not been created, follow the recipe to ensure that the model has been loaded with data.
Start from an existing model within PowerPivot.
Excel will now show PowerPivot Field List and a pivot table work area. These are identified by the arrows in the following screenshot. Note that PowerPivot Field List shows tables from the model as nodes, with the fields from the model as children.
Return to the PowerPivot application window and double-click on the
Table1tab. The name
Table1will be in a blue background and rename it to
Products. Repeat this exercise for the other tables (
Sales). The table names in your PowerPivot window will now look like the following screenshot:
Return to Excel and notice that PowerPivot Field List has detected a change in the model and prompts the user to refresh. Click on the Refresh button and note that the changes in the names of the tables are now reflected in the PowerPivot Field List panel.
Return to the PowerPivot application window and select the
Productstable. Double-click on the
product_idfield and enter
Product IDas the new name. The field will have a blue background when its name can be changed.
Return to the PowerPivot window and select the
Salestable. Right-click anywhere on the
order_number_linefield and select Hide from Client Tools in the pop-up window. Select all the fields
customer_idby clicking-and-dragging the mouse across the three fields and hide these fields too.
PowerPivot mimics Excel in the way that you can select multiple fields by dragging your mouse across several columns (with the left button continually pressed). You can also select the first column, hold the Shift key, and select the final column.
Unlike Excel, multiple columns cannot be selected by using the Ctrl key and selecting multiple fields.
Dayfield to the pivot by expanding the
Datestable and selecting the checkbox next to the
Dayfield. The column will be automatically added to the rows area of the pivot and will be displayed, as shown in the following screenshot:
Return to the PowerPivot window and select the
Daycolumn. From the format list, select More Dates Formats…, and then select the dd-MMM-yy format from the list of available formats. The value presented will show a formatted sample of the data. Choose the item that shows 14-Mar-01.
Return to Excel and refresh the pivot table. Note that the PowerPivot Field List panel may not indicate the change to the model. However, when the pivot table is refreshed, the data displays the new format.
The semantic model defines the metadata structure of the model and includes information such as table names, column names, and data presentation formats. The model designer interacts with the semantic model through its presentation layer in a real-time manner (note that the model did not have to be deployed to a server), so that the changes made in the model are immediately available to the user.
The modeling environment behaves in a What You See Is What You Get (WYSIWYG) manner which means that any changes made to the design environment are reflected in the model that is presented to the user.
There are two methods that the model designer can use to examine the structure of the model. So far, we have only examined the data view. The diagram view shows all tables and columns (including hierarchies) that are used within the model and presents them on a design surface. This is shown in the next recipe.
In addition to table names, column names, and data formats, a semantic model defines how tables within a model relate to each other. This relationship is important because it defines the output of calculations (which are defined in the model). This recipe shows how to create relationships and the effect that these relationships have on the model.
This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.
The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The
Sales table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.
Extending the model further, the
Products table is linked to the
Subcategory table, and the
Subcategory table is linked to the
Category table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:
An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the
Products table through the
product_id field. This is commonly referred to as the primary key for the table.
Start with the workbook that was developed in the prior recipe.
Product Namefield onto the rows of the pivot table (under the
Row Labelscolumn) and the Sales column
total_priceonto values. Your screen should look like the following screenshot:
Return to the PowerPivot window and select the
product_idfield and then click on the Create Relationship button (in the Design tab). A new window will open asking you to define the related (lookup) table and column.
Return to the pivot table and refresh the model. The values for the Sum of total_price field have updated to reflect the total for each product.
Now, create relationships between the following tables and columns:
Click on the Diagram View button in the Home menu to switch to the diagram modeling view. Your screen will now show tables and columns (rather than data) and look like the following screenshot. Note that previously defined relationships appear as lines that connect tables.
You can also switch between the data and diagram views by toggling the two buttons at the bottom-right side of the PowerPivot application status bar.
In the pivot table, replace the
Product Namefield with the
Categoryfield from the
categorytable. The total value (
$1,662,869.42) is repeated for all categories indicating that there is no relationship between the
Productstable, select the
Product IDfield, and drag it to the
product_idfield of the
Subcateogrytable. A new relationship will be created between
Subcategory. Use this method to create a relationship between the
Productstable and the
Categorytable. Refresh the pivot table to ensure that the total value is not duplicated.
In the pivot table, drag the
Product IDfield from the
Productstable into the Values pane. The pivot table will now show the price and number of products for each category and will look like the following screenshot:
The model has been extended to show two things. Firstly, by defining relationships between tables within the model, we have defined the filtering path for the data. This path is used to restrict rows between tables that have a relationship. Secondly, by adding a calculation (Sum of total_price and Count of Product ID), we have created measures that apply an aggregation function to the model fields. These are special types of measures within PowerPivot and are referred to as implicit measures (because the model implicitly defines a calculation for the field).
Relationships define how one table relates to another. In order to define a relationship, the join must occur on a field that has unique values in one of the tables (this is commonly called a primary key). The table that has the field with unique values is commonly called the related table. This can be seen in the diagram view, as shown in the following screenshot with the direction of the arrows on the relationships. Consider the
Products table (which has a unique field
product_id) that is related to the
Sales table (through the
product_id field in that table), but only the
Products table needs to have a unique
product_id. It is also said that the
product_id field relates to many records in the
Sales table. This can be seen by the direction of the arrow between
Products, the related table has the arrow pointing towards it.
Relationships are the primary mechanisms with the model that are used to filter data and perform calculations. That is, the relationship defines how data is filtered when values are shown to the user. Although this is a new concept, the concept of relationships is important because they have important implications with the way that the model determines what data to show to the user. Consider the pivot table shown in the following screenshot—
Subcategory on rows and
Sum of total_price,
Count of Product ID, and
Count of category_id as measures:
The rows in the pivot show the subcategory which defines a filter for each row (that is a filter for each subcategory). This filter can then be applied to the
Products table, which in turn is applied to the
Sales table. It might be better to say that the rows of the
Sales table are filtered by the
Products table and then those rows are filtered by the
Subcategory table. This is why the calculations
Sum of total_price and
Count of Product ID show the correct values. The filter on rows of the
Sales table and rows of the
Products table can be applied in the direction of the arrows of the relationships.
However, this is not the case when
Subcategory is shown with data from the
Category table—a filter will only be applied in the direction that a relationship is created. This is why the calculation
Count of category_id shows the same number for each subcategory. With the subcategory on rows, a filter is created which can filter the
Products table but this filter cannot then applied in an upstream manner to the
The application of filters may seem unintuitive at first, especially with a relationship design such as the one among
Subcategory, but in reality the model should be designed so that the filters can be applied in a single direction. There is also, the question of unmatched values between fields used in the relationship and how they are treated by the model. For example, what would happen if we had a
product_id field in the
Sales table that did not have a match in the
Products table? Would this even be allowed in the model? The tabular model handles this situation very elegantly. The model allows this situation (without error), and unmatched values are assigned to a blank placeholder. For example, if there was a product in the
Sales table and no product in the
Products table, it would be shown as blank when
Subcategory is used in the pivot.
We have also indicated that the model automatically created implicit measures. The term measure is common in business intelligence tools to specify that a calculated value is returned. Often, this can be a simple calculation, for example, the count of rows or the sum of a field. The important thing to remember is that measure is a single value that is returned from the model (when the model is filtered). Usually, measures are defined by a model designer, but they need not be. This is the case with an implicit measure. An implicit measure is defined automatically, depending on the data type of the column that is being used. Numeric columns are automatically summed, whereas text columns are automatically counted.
The aggregation function of an implicit measure is initially set by the underlying data type. However, the user can change this within the pivot table by editing the measure in the pivot table. This can be done in the following manner:
Alter the pivot table so that it shows subcategory on rows and
total_profitas values. By default, the measure will show
Sum of total_profit. Right-click on the measure and select Edit Measure… from the pop-up window.
A new window will open, displaying the aggregation function with the measure. Select Average from the function list and change the measure name to
Average Profit. The Measure Settings window should look like the following screenshot:
Also, notice that a formula is used to define the measure, for
Average Profit, the formula is
=AVERAGE('Sales'[total_profit]). Click on the OK button and note that the pivot table now contains the new measure
Implicit measures that have been created in the model can be seen by exposing the measures in the Advanced tab of the PowerPivot window (the Advanced tab must be activated). This is shown in the following screenshot:
The model designer is often required to add additional fields to tables, so that the information presented to the user is better suited for decision-making purposes. This can include creating new fields that are combinations of other fields within the same table or a calculation that is dependent on data in another table. This recipe looks at the first of these options to create new fields that use other fields within the same table.
The model used in this recipe starts with the model that was created in the previous recipe Using tabular relationships to filter data.
Switch to the data view in the PowerPivot window and select the
Productstable. Select the
Colourcolumn by right-clicking on the column header and selecting Insert Column from the pop-up menu (note that the entire column must be selected). The new column is inserted to the left of the
Colourcolumn. Change the name of the CalculatedColumn1 to
Product Name WC(product name with code).
Enter the following formula into any cell of the new column.
=[Product Name]&" (" & [Product ID] & ")"
All rows of the table will be automatically populated.
Switch to the
Salestable. Double-click on the header row of the last column (the current header is Add Column) and change the name of the column to
total_profit. Enter the following formula into any cell of the Profit column with the format of the column as currency.
The designer has two built-in functions that enable the easy creation of formulas. If the formula is being typed, an intellisense window will open in the formula bar, and show a list of objects that match what is being typed. Simply navigate to the desired column (or cell in the measure grid) and start typing, then press return to use the provided intellisense option (you can use arrow keys to select a function, table and column). Alternatively, a column or table name can be included in the formula by clicking on the column or table while the formula is being typed.
This recipe introduces Data Analysis Expressions (DAX) as the language that is used in tabular modeling. From this recipe, we can see that the DAX language is very similar to an Excel calculation (there are some noticeable differences which are addressed in chapters). Also, note that in DAX, columns are referred to instead of cells. Furthermore, many Excel functions work exactly the same in DAX as they do in Excel.
In calculating the value for each row, a special filter is applied in the calculation. In these examples where the fields being used in the formula reside on a single row, the filter automatically restricts the value to that of the row. The application of filtering in this manner is commonly referred to as a row filter or a row filter context.
There may be a requirement to create fields in a table that contain data from a separate table. In Excel, this would usually be achieved with a
The sales model that has been developed in this chapter contains three tables which define
Category. When the user browses the model in a pivot table, each of these tables appear as tables in the PowerPivot Field List pane. However, in this model, the category and subcategory directly relate to the product and it is our intent to show these fields in the
This recipe assumes that the sales model created in the Adding fields to tables recipe is available and that the appropriate relationships exist among the
Start by opening the PowerPivot window and then perform the following steps:
Switch to the data view and create two new columns in the
Subcategory. In the
Categorycolumn enter the following formula:
Subcategorycolumn enter the following formula:
=LOOKUPVALUE (Subcategory[Subcategory] , Subcategory[product_id],Products[Product ID] )
Category tables in the model by right-clicking on the tables tab and selecting Hide from Client Tools from the pop-up menu. Note that the hidden tables are still visible in the data view and diagram view, although they are now more transparent.
These two formulas achieve the same result but in different ways.
related function returns the specified column, based on the relationship within the data model. This can span more than one table (for example, a related table to the
Category table could be referenced from the
Products table), however, a relationship must be defined between all the linking tables that are spanned by the formula. Furthermore, because the formula relies on these relationships (that is, those defined within the model), the formula will not result in an error since the model enforces the integrity defined by model relationships.
LOOKUPVALUE function is quite different from the
related function because it does not utilize or rely on a relationship within the model. That is,
LOOKUPVALUE would still return the same results had the relationship not be defined between the
Subcategory tables. Furthermore, the
LOOKUPVALUE function can use multiple columns as its reference (to lookup) which may be beneficial when a desired value in another table cannot be related to the source data through a single field. Note that relationships can only be defined on single columns. However, unlike the
RELATED function, the
LOOKUPVALUE function may return an error when more than one match can be found in the lookup table.
Both formulas return results by creating a row context filter for each row in the source table.
It is considered best to utilize the relationship wherever possible. Therefore, the use of the
RELATED function is preferred over the
LOOKUPVALUE function. Furthermore, the
RELATED function makes the model simpler for others to understand. However, the
LOOKUPVALUE function does have some benefits. It allows the value to be determined, based on multiple search conditions. The syntax for
LOOKUPVALUE is defined as:
LOOKUPVALUE( <result_columnName> , <search_columnName>, <search_value> [, <search_columnName>, <search_value>] …)
result_columnName column is returned from a target table where search conditions are satisfied. These conditions are defined by a
search_columnName parameter and a
search_value parameter. This means that we specify the column (in the lookup table) and the value that should be searched for—this is the field in the current table.
The sales model that has been developed in this chapter allows the user to interrogate data from the order list by products, customers, and dates. In doing so, the user can create an implicit measure so that the underlying data is aggregated according to the current filter context. The aggregation function of implicit measures is determined by the underlying data type of the column that is used for the measure. This method offers the user the ability to create and show simple calculations from the model data. However, it does not create a robust model because the calculations aren't readily selectable by the user and the calculation definition is not conformed within the model.
This recipe introduces calculations which are contained within the model and presented to the user as measures. DAX (the tabular model language introduced in Adding fields to tables recipe) is used to define measures, so that it can explicitly use these measures in the model. Furthermore, the ability to create measures through a complex DAX allows the model designer a larger degree of flexibility than is involved with implicit measures.
By default, a table in Data View will have a horizontal line that does not show any data. This is referred to as the calculation area. If this is not visible, ensure that the Calculation Area button is selected in the Home menu.
Select any cell in the
order_quantityfield. Then, from the ribbon, select the SUM function from the AutoSum drop-down. A new calculation will be created in the calculation area as:
Sum of order_quantity:=SUM([order_quantity])
In the formula bar, select the name of the measure (
Sum of order_quantity) and rename it to
Total Order Quantity. The calculation should now look like this:
Total Order Quantity:=SUM([order_quantity])
Right-click on the calculation, select Format Cells... from the pop-up menu, and specify the format as a number format—(decimal number) with zero decimal places and click on the User 1000 separator (,) checkbox.
Rename the measures to
Total Price, and
Total Profitrespectively with the same format as
Total Order Quantity. Note that the format of the number is inherited as currency.
Select the cell in the calculation area under the
unit_pricecolumn and type the formula:
Intellisence provides a list of formulas available (based on the expression that is entered) and a description of the function, as shown in the following screenshot:
Hide the following fields of the
Return to the PowerPivot Field List pane (in Excel) and refresh the model. Existing implicit calculations that were based on existing fields are removed from the model and the
Salestable now includes the additional measures which were created. These measures can now be placed in the pivot tables (value) field list.
Measures that are created in the calculation area operate in the same manner as implicit measures. That is, the aggregation function is applied to the filtered data specified by the row and filter context within the client tool.
Because the measures are explicitly defined in the model they are called explicit measures and interpreted by the client tool as measures (many client tools detect an explicit measure as a special type of field and treat it differently than a table's standard field or dimension field).
By default, the tabular model created in PowerPivot will display the model through a PowerPivot pivot table. This shows the model in its tabular form where measures and columns are shown with respect to the tables that they relate to. For example, the measures created in the Creating model calculations recipe appear under the
Sales table node. In order to compare this (tabular) view with that of a traditional OLAP client, simply do the following:
The client tool (PivotTable Field List) shows the model in a different format and represents how a client tool interpretation of the model would be shown. Here, the measures are shown in measure groups and are not included as table objects.