Solving Many-to-Many Relationship in Dimensional Modeling

Exclusive offer: get 50% off this eBook here
Oracle SOA Suite Developer's Guide

Oracle SOA Suite Developer's Guide — Save 50%

Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3

$47.99    $24.00
by Djoni Darmawikarta | December 2009 | Architecture & Analysis

Modeling book sales is not a big deal, you just need a sales fact and a book dimension, and other dimensions, such as date dimension. From this model, you can analyze daily book sales.

But, what if a book is written by more than one writer and an individual writer writes more than one book, i.e. relationship of book to writer is many to many? How would you enhance the existing book sales dimensional model, which has book granularity, to also handle sales analysis by individual writer, i.e. analysis at writer granularity?

In this article by Djoni Darmawikarta, we will discuss the two possible dimensional modeling solutions to handle many-to-many relationship.

Bridge table solution

We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution.

Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book.

Assume the BOOK_DIM table has five rows:

BOOK_SK

TITLE

AUTHOR

1

Programming in Java

King, Chan

3

Learning Python

Simpson

2

Introduction to BIRT

Chan, Gupta, Simpson (Editor)

4

Advanced Java

King, Chan

5

Beginning XML

King, Chan (Foreword)

The DATE_DIM has two rows:

DATE_SK

DT

1

11-DEC-2009

2

12-DEC-2009

3

13-DEC-2009

And, the SALES_FACT table has ten rows:

DATE_SK

BOOK_SK

SALES_AMT

1

1

1000

1

2

2000

1

3

3000

1

4

4000

2

2

2500

2

3

3500

2

4

4500

2

5

5500

3

3

8000

3

4

8500

Note that:
The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.
King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.
Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”.

To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table:

SELECT dt, title, sales_amt 
FROM sales_fact s, date_dim d, book_dim b
WHERE s.date_sk = d.date_sk
AND s.book_sk = b.book_sk

This query produces the result showing the daily sales amount of every book that has a sale:

DT

TITLE

SALES_AMT

11-DEC-09

Advanced Java

4000

11-DEC-09

Introduction to BIRT

2000

11-DEC-09

Learning Python

3000

11-DEC-09

Programming in Java

1000

12-DEC-09

Advanced Java

4500

12-DEC-09

Beginning XML

5500

12-DEC-09

Introduction to BIRT

2500

12-DEC-09

Learning Python

3500

13-DEC-09

Advanced Java

8500

13-DEC-09

Learning Python

8000

You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table.

The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors.

AUTHOR_SK

NAME

1

Chan

2

King

3

Gupta

4

Simpson

The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1.

The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.

Oracle SOA Suite Developer's Guide Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3
Published: March 2009
eBook Price: $47.99
Book Price: $79.99
See more
Select your format and quantity:

Our example AUTHOR_GROUP table has to then be populated as follows:

AUTHOR_GROUP_SK

AUTHOR_SK

WEIGHTING_FACTOR

ROLE

1

1

0.5

CO-AUTHOR

1

2

0.5

CO-AUTHOR

2

1

0.45

CO-AUTHOR

2

3

0.45

CO-AUTHOR

2

4

0.1

EDITOR

3

4

1

AUTHOR

4

1

0.1

EDITOR

4

2

0.9

CO-AUTHOR

We don’t need to repeat King and Chan for different books for the same roles, co-authors for the “Programming in Java” and “Advanced Java”, we just need one, the AUTHOR_GROUP_SK = 1. On “Beginning XML”, Chan is an editor, not co-author, so we need a different author group row, AUTHOR_GROUP_SK = 4.  Essentially, the AUTHOR_GROUP relationship to the SALES_FACT is many-to-many.

We must add a new column, AUTHOR_GROUP_SK, in the SALES_FACT table, and populate the new column with the proper values from the AUTHOR_GROUP_SK in the AUTHOR_GROUP table. For next sales this population must be taken care of.

So, our existing sales in the SALES_FACT table now look as follow:

BOOK_SK

AUTHOR_GROUP_SK

DATE_SK

SALES_AMT

1

1

1

1000

2

2

1

2000

2

2

2

2500

3

3

1

3000

3

3

2

3500

4

1

1

4000

4

1

2

4500

5

4

2

5500

To analyze daily sales by author, we can run the following query:

SELECT dt, title, name, role, weighting_factor * sales_amt sales_amt
FROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim a
WHERE s.author_group_sk = ag.author_group_sk
AND s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk

This query produces the following output showing daily sales contribution by authors on each of the books which they collaborate:

DT

TITLE

NAME

ROLE

SALES_AMT

11-DEC-09

Advanced Java

Chan

CO-AUTHOR

2000

11-DEC-09

Advanced Java

King

CO-AUTHOR

2000

11-DEC-09

Introduction to BIRT

Chan

CO-AUTHOR

900

11-DEC-09

Introduction to BIRT

Gupta

CO-AUTHOR

900

11-DEC-09

Introduction to BIRT

Simpson

EDITOR

200

11-DEC-09

Learning Python

Simpson

AUTHOR

3000

11-DEC-09

Programming in Java

Chan

CO-AUTHOR

500

11-DEC-09

Programming in Java

King

CO-AUTHOR

500

12-DEC-09

Advanced Java

Chan

CO-AUTHOR

2250

12-DEC-09

Advanced Java

King

CO-AUTHOR

2250

12-DEC-09

Beginning XML

Chan

EDITOR

550

12-DEC-09

Beginning XML

King

CO-AUTHOR

4950

12-DEC-09

Introduction to BIRT

Chan

CO-AUTHOR

1125

12-DEC-09

Introduction to BIRT

Gupta

CO-AUTHOR

1125

12-DEC-09

Introduction to BIRT

Simpson

EDITOR

250

12-DEC-09

Learning Python

Simpson

AUTHOR

3500

13-DEC-09

Advanced Java

Chan

CO-AUTHOR

4250

13-DEC-09

Advanced Java

King

CO-AUTHOR

4250

13-DEC-09

Learning Python

Simpson

AUTHOR

8000

To analyze the daily book sales, we just need to group the sales of each book as follows:

SELECT dt, title, SUM(weighting_factor * sales_amt) sales_amt
FROM sales_fact s, date_dim d, book_dim b, author_group ag, author_dim a
WHERE s.author_group_sk = ag.author_group_sk
AND s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk
GROUP BY dt, title

The output must of course produce the same output as before we added the bridge.

DT

TITLE

SALES_AMT

11-DEC-09

Programming in Java

1000

11-DEC-09

Introduction to BIRT

2000

11-DEC-09

Learning Python

3000

11-DEC-09

Advanced Java

4000

12-DEC-09

Introduction to BIRT

2500

12-DEC-09

Learning Python

3500

12-DEC-09

Advanced Java

4500

12-DEC-09

Beginning XML

5500

13-DEC-09

Learning Python

8000

13-DEC-09

Advanced Java

8500

A view to bundle multi-value bridge into the fact

We can create a view such that the bridge table is transparent to the AUTHOR_DIM table.

CREATE VIEW sales_fact_v AS
SELECT book_sk, date_sk, author_sk, sales_amt
FROM sales_fact s, author_group a
WHERE s.AUTHOR_GROUP_SK = a.author_group_sk

Now our query to analyze daily sales by author is a standard dimensional query, joining the fact directly to the dimensions.

SELECT dt, title, name, role, sales_amt
FROM sales_fact_v s, date_dim d, book_dim b, author_dim a
WHERE s.date_sk = d.date_sk
AND s.book_sk = b.book_sk
AND ag.author_sk = a.author_sk
Oracle SOA Suite Developer's Guide Design and build Service-Oriented Architecture Solutions with the Oracle SOA Suite 10gR3
Published: March 2009
eBook Price: $47.99
Book Price: $79.99
See more
Select your format and quantity:

Dimensional purist solution

In a pure dimensional model, all dimension tables connect directly to their fact table(s). The bridge solution violates this rule as the author dimension does not connect directly to the sales fact table. To solve this issue we need to create an artificial fact table to which we directly connect the AUTHOR_DIM table. We now have multi-star schema, the SALES_FACT star and the ROLE_FACT star; they share the BOOK_DIM dimension.

We need to populate ROLE_FACT table one row for every author involves in every book, i.e. one-to-one intersection of book and author, just like those of other fact tables. For our example the table must have rows as shown:

BOOK_SK

AUTHOR_SK

ROLE

CONTRIBUTING_FACTOR

1

1

CO-AUTHOR

0.5

1

2

CO-AUTHOR

0.5

2

1

EDITOR

0.1

2

3

CO-AUTHOR

0.45

2

4

CO-AUTHOR

0.45

3

4

AUTHOR

1.0

4

1

CO-AUTHOR

0.5

4

2

CO-AUTHOR

0.5

5

1

EDITOR

0.1

5

2

AUTHOR

0.9

We can query the stars independently, or if we want to analyze daily sales by author, we can combine them (drilling across the two stars). For example:

SELECT dt, title, name, sales_amt FROM
(SELECT book_sk, dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim b
WHERE s.book_sk = b.book_sk
and s.date_sk = d.date_sk) sales,
(SELECT b.book_sk, title, name, role, weighting_factor FROM role r, book_dim_dim b, author_dim a
WHERE r.book_sk = b.book_sk
AND r.author_sk = a.author_sk
ORDER BY b.title, a.name) author
WHERE sales.book_sk = author.book_sk

Summary

The dimensional purist solution is preferable than the bridge solution. First, we do not need to change the existing fact (adding the author_group_sk in our example) and dimension tables. Conformance to dimensional modeling rule is not only good for current consistency across your dimensional model but also for their future enhancement. You must have also noticed that populating the role fact is just like that of other fact tables in dimensional models. The only minus factor is that the role fact table might contain more rows than the author group bridge table as some author groups might be “reusable” if the groups write more than one book together with the same roles in all their books, while the role fact table must always have one row for every author contributing to every book regardless their roles.

About the Author :


Djoni Darmawikarta built his career in IBM Asia Pacific and Canada as a software engineer, international consultant, instructor and project manager, for a total of 17 years. He's currently a technical specialist in the Data Warehousing and Business Intelligence team of a Toronto-based insurance company. Outside of his office works, Djoni writes IT articles and books.

Books From Packt

Maximize Your Investment: 10 Key Strategies for Effective Packaged Software Implementations
Maximize Your Investment: 10 Key Strategies for Effective Packaged Software Implementations

Backbase 4 RIA Development
Backbase 4 RIA Development

IBM Cognos 8 Planning
IBM Cognos 8 Planning

SAP Business ONE Implementation
SAP Business ONE Implementation

ASP.NET 3.5 Application Architecture and Design
ASP.NET 3.5 Application Architecture and Design

RESTful PHP Web Services
RESTful PHP Web Services

SOA Cookbook
SOA Cookbook

SOA Governance
SOA Governance

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software