This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.
The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The Sales table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.
Extending the model further, the Products table is linked to the Subcategory table, and the Subcategory table is linked to the Category table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:
An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the Products table through the product_id field. This is commonly referred to as the primary key for the table.
In contrast, the referring column (product_id in the Sales table) can have many occurrences of the product_id field and is commonly referred to as the foreign key.
The model has been extended to show two things. Firstly, by defining relationships between tables within the model, we have defined the filtering path for the data. This path is used to restrict rows between tables that have a relationship. Secondly, by adding a calculation (Sum of total_price and Count of Product ID), we have created measures that apply an aggregation function to the model fields. These are special types of measures within PowerPivot and are referred to as implicit measures (because the model implicitly defines a calculation for the field).
Relationships define how one table relates to another. In order to define a relationship, the join must occur on a field that has unique values in one of the tables (this is commonly called a primary key). The table that has the field with unique values is commonly called the related table. This can be seen in the diagram view, as shown in the following screenshot with the direction of the arrows on the relationships. Consider the Products table (which has a unique field product_id) that is related to the Sales table (through the product_id field in that table), but only the Products table needs to have a unique product_id. It is also said that the product_id field relates to many records in the Sales table. This can be seen by the direction of the arrow between Sales and Products, the related table has the arrow pointing towards it.
Relationships are important because they define how data is filtered and calculated when it is presented to the user.
Relationships are the primary mechanisms with the model that are used to filter data and perform calculations. That is, the relationship defines how data is filtered when values are shown to the user. Although this is a new concept, the concept of relationships is important because they have important implications with the way that the model determines what data to show to the user. Consider the pivot table shown in the following screenshot—Subcategory on rows and Sum of total_price, Count of Product ID, and Count of category_id as measures:
Now, consider the relationship defined in the model. This is summarized in the following screenshot:
The rows in the pivot show the subcategory which defines a filter for each row (that is a filter for each subcategory). This filter can then be applied to the Products table, which in turn is applied to the Sales table. It might be better to say that the rows of the Sales table are filtered by the Products table and then those rows are filtered by the Subcategory table. This is why the calculations Sum of total_price and Count of Product ID show the correct values. The filter on rows of the Sales table and rows of the Products table can be applied in the direction of the arrows of the relationships.
However, this is not the case when Subcategory is shown with data from the Category table—a filter will only be applied in the direction that a relationship is created. This is why the calculation Count of category_id shows the same number for each subcategory. With the subcategory on rows, a filter is created which can filter the Products table but this filter cannot then applied in an upstream manner to the Category table.
The application of filters may seem unintuitive at first, especially with a relationship design such as the one among Products, Category, and Subcategory, but in reality the model should be designed so that the filters can be applied in a single direction. There is also, the question of unmatched values between fields used in the relationship and how they are treated by the model. For example, what would happen if we had a product_id field in the Sales table that did not have a match in the Products table? Would this even be allowed in the model? The tabular model handles this situation very elegantly. The model allows this situation (without error), and unmatched values are assigned to a blank placeholder. For example, if there was a product in the Sales table and no product in the Products table, it would be shown as blank when Products, Category, or Subcategory is used in the pivot.
We have also indicated that the model automatically created implicit measures. The term measure is common in business intelligence tools to specify that a calculated value is returned. Often, this can be a simple calculation, for example, the count of rows or the sum of a field. The important thing to remember is that measure is a single value that is returned from the model (when the model is filtered). Usually, measures are defined by a model designer, but they need not be. This is the case with an implicit measure. An implicit measure is defined automatically, depending on the data type of the column that is being used. Numeric columns are automatically summed, whereas text columns are automatically counted.