# Solving Many-to-Many Relationship in Dimensional Modeling

Exclusive offer: get 50% off this eBook here

### Oracle SOA Suite Developer's Guide — Save 50%

Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3

\$47.99    \$24.00
by Djoni Darmawikarta | December 2009 | Architecture & Analysis

Modeling book sales is not a big deal, you just need a sales fact and a book dimension, and other dimensions, such as date dimension. From this model, you can analyze daily book sales.

But, what if a book is written by more than one writer and an individual writer writes more than one book, i.e. relationship of book to writer is many to many? How would you enhance the existing book sales dimensional model, which has book granularity, to also handle sales analysis by individual writer, i.e. analysis at writer granularity?

In this article by Djoni Darmawikarta, we will discuss the two possible dimensional modeling solutions to handle many-to-many relationship.

# Bridge table solution

We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution.

Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book.

Assume the BOOK_DIM table has five rows:

 BOOK_SK TITLE AUTHOR 1 Programming in Java King, Chan 3 Learning Python Simpson 2 Introduction to BIRT Chan, Gupta, Simpson (Editor) 4 Advanced Java King, Chan 5 Beginning XML King, Chan (Foreword)

The DATE_DIM has two rows:

 DATE_SK DT 1 11-DEC-2009 2 12-DEC-2009 3 13-DEC-2009

And, the SALES_FACT table has ten rows:

 DATE_SK BOOK_SK SALES_AMT 1 1 1000 1 2 2000 1 3 3000 1 4 4000 2 2 2500 2 3 3500 2 4 4500 2 5 5500 3 3 8000 3 4 8500

Note that:
The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.
King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.
Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”.

To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table:

`SELECT dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim bWHERE s.date_sk = d.date_skAND s.book_sk = b.book_sk`

This query produces the result showing the daily sales amount of every book that has a sale:

 DT TITLE SALES_AMT 11-DEC-09 Advanced Java 4000 11-DEC-09 Introduction to BIRT 2000 11-DEC-09 Learning Python 3000 11-DEC-09 Programming in Java 1000 12-DEC-09 Advanced Java 4500 12-DEC-09 Beginning XML 5500 12-DEC-09 Introduction to BIRT 2500 12-DEC-09 Learning Python 3500 13-DEC-09 Advanced Java 8500 13-DEC-09 Learning Python 8000

You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table.

The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors.

 AUTHOR_SK NAME 1 Chan 2 King 3 Gupta 4 Simpson

The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1.

The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.

 Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3
Published: March 2009
eBook Price: \$47.99
Book Price: \$79.99
See more

Our example AUTHOR_GROUP table has to then be populated as follows:

 AUTHOR_GROUP_SK AUTHOR_SK WEIGHTING_FACTOR ROLE 1 1 0.5 CO-AUTHOR 1 2 0.5 CO-AUTHOR 2 1 0.45 CO-AUTHOR 2 3 0.45 CO-AUTHOR 2 4 0.1 EDITOR 3 4 1 AUTHOR 4 1 0.1 EDITOR 4 2 0.9 CO-AUTHOR

We don’t need to repeat King and Chan for different books for the same roles, co-authors for the “Programming in Java” and “Advanced Java”, we just need one, the AUTHOR_GROUP_SK = 1. On “Beginning XML”, Chan is an editor, not co-author, so we need a different author group row, AUTHOR_GROUP_SK = 4.  Essentially, the AUTHOR_GROUP relationship to the SALES_FACT is many-to-many.

We must add a new column, AUTHOR_GROUP_SK, in the SALES_FACT table, and populate the new column with the proper values from the AUTHOR_GROUP_SK in the AUTHOR_GROUP table. For next sales this population must be taken care of.

So, our existing sales in the SALES_FACT table now look as follow:

 BOOK_SK AUTHOR_GROUP_SK DATE_SK SALES_AMT 1 1 1 1000 2 2 1 2000 2 2 2 2500 3 3 1 3000 3 3 2 3500 4 1 1 4000 4 1 2 4500 5 4 2 5500

To analyze daily sales by author, we can run the following query:

`SELECT dt, title, name, role, weighting_factor * sales_amt sales_amtFROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim aWHERE s.author_group_sk = ag.author_group_skAND s.date_sk = d.date_skAND s.book_sk = b.book_skAND ag.author_sk = a.author_sk`

This query produces the following output showing daily sales contribution by authors on each of the books which they collaborate:

 DT TITLE NAME ROLE SALES_AMT 11-DEC-09 Advanced Java Chan CO-AUTHOR 2000 11-DEC-09 Advanced Java King CO-AUTHOR 2000 11-DEC-09 Introduction to BIRT Chan CO-AUTHOR 900 11-DEC-09 Introduction to BIRT Gupta CO-AUTHOR 900 11-DEC-09 Introduction to BIRT Simpson EDITOR 200 11-DEC-09 Learning Python Simpson AUTHOR 3000 11-DEC-09 Programming in Java Chan CO-AUTHOR 500 11-DEC-09 Programming in Java King CO-AUTHOR 500 12-DEC-09 Advanced Java Chan CO-AUTHOR 2250 12-DEC-09 Advanced Java King CO-AUTHOR 2250 12-DEC-09 Beginning XML Chan EDITOR 550 12-DEC-09 Beginning XML King CO-AUTHOR 4950 12-DEC-09 Introduction to BIRT Chan CO-AUTHOR 1125 12-DEC-09 Introduction to BIRT Gupta CO-AUTHOR 1125 12-DEC-09 Introduction to BIRT Simpson EDITOR 250 12-DEC-09 Learning Python Simpson AUTHOR 3500 13-DEC-09 Advanced Java Chan CO-AUTHOR 4250 13-DEC-09 Advanced Java King CO-AUTHOR 4250 13-DEC-09 Learning Python Simpson AUTHOR 8000

To analyze the daily book sales, we just need to group the sales of each book as follows:

`SELECT dt, title, SUM(weighting_factor * sales_amt) sales_amtFROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim aWHERE s.author_group_sk = ag.author_group_skAND s.date_sk = d.date_skAND s.book_sk = b.book_skAND ag.author_sk = a.author_skGROUP BY dt, title`

The output must of course produce the same output as before we added the bridge.

 DT TITLE SALES_AMT 11-DEC-09 Programming in Java 1000 11-DEC-09 Introduction to BIRT 2000 11-DEC-09 Learning Python 3000 11-DEC-09 Advanced Java 4000 12-DEC-09 Introduction to BIRT 2500 12-DEC-09 Learning Python 3500 12-DEC-09 Advanced Java 4500 12-DEC-09 Beginning XML 5500 13-DEC-09 Learning Python 8000 13-DEC-09 Advanced Java 8500

# A view to bundle multi-value bridge into the fact

We can create a view such that the bridge table is transparent to the AUTHOR_DIM table.

`CREATE VIEW sales_fact_v ASSELECT book_sk, date_sk, author_sk, sales_amtFROM sales_fact s, author_group aWHERE s.AUTHOR_GROUP_SK = a.author_group_sk `

Now our query to analyze daily sales by author is a standard dimensional query, joining the fact directly to the dimensions.

`SELECT dt, title, name, role, sales_amtFROM sales_fact_v s, date_dim d, book_dim b, author_dim aWHERE s.date_sk = d.date_skAND s.book_sk = b.book_skAND ag.author_sk = a.author_sk`
 Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3
Published: March 2009
eBook Price: \$47.99
Book Price: \$79.99
See more

# Dimensional purist solution

In a pure dimensional model, all dimension tables connect directly to their fact table(s). The bridge solution violates this rule as the author dimension does not connect directly to the sales fact table. To solve this issue we need to create an artificial fact table to which we directly connect the AUTHOR_DIM table. We now have multi-star schema, the SALES_FACT star and the ROLE_FACT star; they share the BOOK_DIM dimension.

We need to populate ROLE_FACT table one row for every author involves in every book, i.e. one-to-one intersection of book and author, just like those of other fact tables. For our example the table must have rows as shown:

 BOOK_SK AUTHOR_SK ROLE CONTRIBUTING_FACTOR 1 1 CO-AUTHOR 0.5 1 2 CO-AUTHOR 0.5 2 1 EDITOR 0.1 2 3 CO-AUTHOR 0.45 2 4 CO-AUTHOR 0.45 3 4 AUTHOR 1.0 4 1 CO-AUTHOR 0.5 4 2 CO-AUTHOR 0.5 5 1 EDITOR 0.1 5 2 AUTHOR 0.9

We can query the stars independently, or if we want to analyze daily sales by author, we can combine them (drilling across the two stars). For example:

`SELECT dt, title, name, sales_amt FROM(SELECT book_sk, dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim bWHERE s.book_sk = b.book_skand s.date_sk = d.date_sk) sales, (SELECT b.book_sk, title, name, role, weighting_factor FROM role r, book_dim_dim b, author_dim aWHERE r.book_sk = b.book_skAND r.author_sk = a.author_skORDER BY b.title, a.name) authorWHERE sales.book_sk = author.book_sk`

# Summary

The dimensional purist solution is preferable than the bridge solution. First, we do not need to change the existing fact (adding the author_group_sk in our example) and dimension tables. Conformance to dimensional modeling rule is not only good for current consistency across your dimensional model but also for their future enhancement. You must have also noticed that populating the role fact is just like that of other fact tables in dimensional models. The only minus factor is that the role fact table might contain more rows than the author group bridge table as some author groups might be “reusable” if the groups write more than one book together with the same roles in all their books, while the role fact table must always have one row for every author contributing to every book regardless their roles.