Your message has been sent.
This article has been saved to your account.
Go to my account
This article has been emailed to your Kindle.
Send this article
Complete the form below to send this article, Solving Many-to-Many Relationship in Dimensional Modeling, to a friend (or to yourself). We will never share your details (or your friend's) with anyone. For more information, read our Privacy Policy.
Modeling book sales is not a big deal, you just need a sales fact and a book dimension, and other dimensions, such as date dimension. From this model, you can analyze daily book sales.
But, what if a book is written by more than one writer and an individual writer writes more than one book, i.e. relationship of book to writer is many to many? How would you enhance the existing book sales dimensional model, which has book granularity, to also handle sales analysis by individual writer, i.e. analysis at writer granularity?
In this article by Djoni Darmawikarta, we will discuss the two possible dimensional modeling solutions to handle many-to-many relationship.
Bridge table solution
We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution.
Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book.
Assume the BOOK_DIM table has five rows:
-
BOOK_SK
TITLE
AUTHOR
1
Programming in Java
King, Chan
3
Learning Python
Simpson
2
Introduction to BIRT
Chan, Gupta, Simpson (Editor)
4
Advanced Java
King, Chan
5
Beginning XML
King, Chan (Foreword)
The DATE_DIM has two rows:
-
DATE_SK
DT
1
11-DEC-2009
2
12-DEC-2009
3
13-DEC-2009
And, the SALES_FACT table has ten rows:
-
DATE_SK
BOOK_SK
SALES_AMT
1
1
1000
1
2
2000
1
3
3000
1
4
4000
2
2
2500
2
3
3500
2
4
4500
2
5
5500
3
3
8000
3
4
8500
Note that:
The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.
King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.
Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”.
To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table:
SELECT dt, title, sales_amt
FROM sales_fact s, date_dim d, book_dim b
WHERE s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
This query produces the result showing the daily sales amount of every book that has a sale:
-
DT
TITLE
SALES_AMT
11-DEC-09
Advanced Java
4000
11-DEC-09
Introduction to BIRT
2000
11-DEC-09
Learning Python
3000
11-DEC-09
Programming in Java
1000
12-DEC-09
Advanced Java
4500
12-DEC-09
Beginning XML
5500
12-DEC-09
Introduction to BIRT
2500
12-DEC-09
Learning Python
3500
13-DEC-09
Advanced Java
8500
13-DEC-09
Learning Python
8000
You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table.
The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors.
-
AUTHOR_SK
NAME
1
Chan
2
King
3
Gupta
4
Simpson
The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1.
The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.
Our example AUTHOR_GROUP table has to then be populated as follows:
-
AUTHOR_GROUP_SK
AUTHOR_SK
WEIGHTING_FACTOR
ROLE
1
1
0.5
CO-AUTHOR
1
2
0.5
CO-AUTHOR
2
1
0.45
CO-AUTHOR
2
3
0.45
CO-AUTHOR
2
4
0.1
EDITOR
3
4
1
AUTHOR
4
1
0.1
EDITOR
4
2
0.9
CO-AUTHOR
We don’t need to repeat King and Chan for different books for the same roles, co-authors for the “Programming in Java” and “Advanced Java”, we just need one, the AUTHOR_GROUP_SK = 1. On “Beginning XML”, Chan is an editor, not co-author, so we need a different author group row, AUTHOR_GROUP_SK = 4. Essentially, the AUTHOR_GROUP relationship to the SALES_FACT is many-to-many.
We must add a new column, AUTHOR_GROUP_SK, in the SALES_FACT table, and populate the new column with the proper values from the AUTHOR_GROUP_SK in the AUTHOR_GROUP table. For next sales this population must be taken care of.
So, our existing sales in the SALES_FACT table now look as follow:
-
BOOK_SK
AUTHOR_GROUP_SK
DATE_SK
SALES_AMT
1
1
1
1000
2
2
1
2000
2
2
2
2500
3
3
1
3000
3
3
2
3500
4
1
1
4000
4
1
2
4500
5
4
2
5500
To analyze daily sales by author, we can run the following query:
SELECT dt, title, name, role, weighting_factor * sales_amt sales_amt
FROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim a
WHERE s.author_group_sk = ag.author_group_sk
AND s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk
This query produces the following output showing daily sales contribution by authors on each of the books which they collaborate:
-
-
DT
TITLE
NAME
ROLE
SALES_AMT
11-DEC-09
Advanced Java
Chan
CO-AUTHOR
2000
11-DEC-09
Advanced Java
King
CO-AUTHOR
2000
11-DEC-09
Introduction to BIRT
Chan
CO-AUTHOR
900
11-DEC-09
Introduction to BIRT
Gupta
CO-AUTHOR
900
11-DEC-09
Introduction to BIRT
Simpson
EDITOR
200
11-DEC-09
Learning Python
Simpson
AUTHOR
3000
11-DEC-09
Programming in Java
Chan
CO-AUTHOR
500
11-DEC-09
Programming in Java
King
CO-AUTHOR
500
12-DEC-09
Advanced Java
Chan
CO-AUTHOR
2250
12-DEC-09
Advanced Java
King
CO-AUTHOR
2250
12-DEC-09
Beginning XML
Chan
EDITOR
550
12-DEC-09
Beginning XML
King
CO-AUTHOR
4950
12-DEC-09
Introduction to BIRT
Chan
CO-AUTHOR
1125
12-DEC-09
Introduction to BIRT
Gupta
CO-AUTHOR
1125
12-DEC-09
Introduction to BIRT
Simpson
EDITOR
250
12-DEC-09
Learning Python
Simpson
AUTHOR
3500
13-DEC-09
Advanced Java
Chan
CO-AUTHOR
4250
13-DEC-09
Advanced Java
King
CO-AUTHOR
4250
13-DEC-09
Learning Python
Simpson
AUTHOR
8000
-
To analyze the daily book sales, we just need to group the sales of each book as follows:
SELECT dt, title, SUM(weighting_factor * sales_amt) sales_amt
FROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim a
WHERE s.author_group_sk = ag.author_group_sk
AND s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk
GROUP BY dt, title
The output must of course produce the same output as before we added the bridge.
-
DT
TITLE
SALES_AMT
11-DEC-09
Programming in Java
1000
11-DEC-09
Introduction to BIRT
2000
11-DEC-09
Learning Python
3000
11-DEC-09
Advanced Java
4000
12-DEC-09
Introduction to BIRT
2500
12-DEC-09
Learning Python
3500
12-DEC-09
Advanced Java
4500
12-DEC-09
Beginning XML
5500
13-DEC-09
Learning Python
8000
13-DEC-09
Advanced Java
8500
A view to bundle multi-value bridge into the fact
We can create a view such that the bridge table is transparent to the AUTHOR_DIM table.
CREATE VIEW sales_fact_v AS
SELECT book_sk, date_sk, author_sk, sales_amt
FROM sales_fact s, author_group a
WHERE s.AUTHOR_GROUP_SK = a.author_group_sk
Now our query to analyze daily sales by author is a standard dimensional query, joining the fact directly to the dimensions.
SELECT dt, title, name, role, sales_amt
FROM sales_fact_v s, date_dim d, book_dim b, author_dim a
WHERE s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk
Dimensional purist solution
In a pure dimensional model, all dimension tables connect directly to their fact table(s). The bridge solution violates this rule as the author dimension does not connect directly to the sales fact table. To solve this issue we need to create an artificial fact table to which we directly connect the AUTHOR_DIM table. We now have multi-star schema, the SALES_FACT star and the ROLE_FACT star; they share the BOOK_DIM dimension.
We need to populate ROLE_FACT table one row for every author involves in every book, i.e. one-to-one intersection of book and author, just like those of other fact tables. For our example the table must have rows as shown:
-
BOOK_SK
AUTHOR_SK
ROLE
CONTRIBUTING_FACTOR
1
1
CO-AUTHOR
0.5
1
2
CO-AUTHOR
0.5
2
1
EDITOR
0.1
2
3
CO-AUTHOR
0.45
2
4
CO-AUTHOR
0.45
3
4
AUTHOR
1.0
4
1
CO-AUTHOR
0.5
4
2
CO-AUTHOR
0.5
5
1
EDITOR
0.1
5
2
AUTHOR
0.9
We can query the stars independently, or if we want to analyze daily sales by author, we can combine them (drilling across the two stars). For example:
SELECT dt, title, name, sales_amt FROM
(SELECT book_sk, dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim b
WHERE s.book_sk = b.book_sk
and s.date_sk = d.date_sk) sales,
(SELECT b.book_sk, title, name, role, weighting_factor FROM role r, book_dim_dim b, author_dim a
WHERE r.book_sk = b.book_sk
AND r.author_sk = a.author_sk
ORDER BY b.title, a.name) author
WHERE sales.book_sk = author.book_sk
Summary
The dimensional purist solution is preferable than the bridge solution. First, we do not need to change the existing fact (adding the author_group_sk in our example) and dimension tables. Conformance to dimensional modeling rule is not only good for current consistency across your dimensional model but also for their future enhancement. You must have also noticed that populating the role fact is just like that of other fact tables in dimensional models. The only minus factor is that the role fact table might contain more rows than the author group bridge table as some author groups might be “reusable” if the groups write more than one book together with the same roles in all their books, while the role fact table must always have one row for every author contributing to every book regardless their roles.
About the Author :
Djoni Darmawikarta built his career in IBM Asia Pacific and Canada as a software engineer, international consultant, instructor and project manager, for a total of 17 years. He's currently a technical specialist in the Data Warehousing and Business Intelligence team of a Toronto-based insurance company. Outside of his office works, Djoni writes IT articles and books.
Books From Packt
|
|




Post new comment