About this book

For most of us, setting up the database for an application is often an afterthought. While you don't need to be a professional database designer to create a working application, knowing a few insider tips and techniques can make both the process easier and the end result much more effective. This book doesn't set out to make you an expert in data analysis, but it does provide a quick and easy way to raise your game in this essential part of getting your application right.

Publication date:
November 2006
Publisher
Packt
Pages
108
ISBN
9781904811305

 

Chapter 1. Introducing MySQL Design

Data design is an essential part of the application development cycle. By analogy, building an application is like building a house. Having the right tools is important, but we need a solid foundation: the data structure. However, producing a good data structure can be a daunting challenge; the quest for a perfect data structure can lead us to new territories where many methods are available. Which one is the best? How can we keep our focus on the goal to achieve, without losing our time?

Data design for MySQL databases is both a science and an art, and there must be a good balance between the scientific and the empiric aspects of the method. The scientific aspect refers to information technology (IT) principles, whereas the empiric facet is mostly based on intuitions and experience.

This book is primarily oriented towards MySQL databases. It teaches how to plan the data structure and how to implement it physically using MySQL's model. The planning part is sometimes referred to as logical design, but it is preferable to view the logical/physical process as a whole.

 

MySQL's Popularity and Impact


MySQL (www.mysql.com), launched in 1995, has become the most popular open source database system. Virtually all web providers include MySQL as part of their hosting plan, often on the ubiquitous LAMP (Linux, Apache, MySQL, PHP) platform. Another root cause of MySQL's popularity has been the ongoing success of phpMyAdmin (www.phpmyadmin.net), a well-established MySQL web-based interface. Therefore many websites use MySQL as their back-end data repository.

The Need for MySQL Design

Overall, MySQL's popularity has attracted many web developers, some of them having no prior IT experience. When faced with the task of transforming a static website into a dynamic/transactional one, or integrating corporate data into the site, developers are sometimes inclined to improvise a data structure. This structure (or lack of structure) may work for a certain time but later fails because of lack of depth. Maybe the system initially works because it started small, with only a few functions planned and implemented, but falls apart when users ask more of it. A poorly designed data structure can only be patched to a certain extent. It can also have scaling issues, when the initial testing has been done with only a few rows of data.

The apparent facility of using the tools may hide the fact that database design depends upon essential principles. Eluding them can render an application costly to maintain, because correcting data structural errors after application coding has begun is time consuming.

"What do I do Next?"

Here is an example of the impact of MySQL in the ranks of non-IT people. I once saw this question in a phpMyAdmin discussion forum—I am citing it from memory: "I've installed MySQL and phpMyAdmin, now I need directions: what do I do next?" I answered "Maybe you could create a table, and then insert some data into it. Next you could browse for your data."

Clearly, those tools were perceived as interesting by this person, but I can only wonder what kind of table structure came into existence after this forum conversation.

Data Design Steps

We can think of data design as a sequence of steps whose goal is to produce the physical MySQL databases, tables, and columns necessary to support an application.

Starting with the outer shell, we first need to learn about our data by collecting it. We then start to organize these data elements by naming them appropriately. This is followed by regrouping the data elements into tables, taking into account the needed keys. Whereas the previous steps could have been done only on paper, the final step is to implement the model within MySQL's structure.

All these steps are covered in distinct chapters of this book.

 

Data as a Resource


Before examining the various techniques available for design, let's think about the concept of data itself.

Organizations and enterprises use many assets, for example buildings, furniture, brains, but perhaps the most valuable asset is information or data. We remark that data documents the enterprise's procedures, and binds people into an ongoing exchange of information, called information flow. Computers help to formalize this data but we have to remember that it exists by itself.

But this is my Data!

When building data designs, we have to meet users and understand the enterprise's data flow. In an ideal world, every department, including the IT department, and every user would collaborate in order to help data flow easily between departments. However, from time to time, one can witness two attitudes that impede the normal data flow in enterprises. The first one is that some IT departments, having the responsibility for the computers where data resides, come to think that the data is theirs. This has the effect of keeping a certain level of secrecy that hides data and can block the data design process. The second one is a variation of the first one, this time caused by a user—data originates from this user and he has a tendency not to share it.

As an example of this latter attitude, let's consider accounting data. Before the PC era, accounting systems existed inside mainframes or minicomputers, and the IT department managed all data including accounting data. Since the advent of microcomputers and spreadsheet applications, an accounting clerk can manage a great deal of data, producing high-quality reports about it. However, this data often resides on his computer; he enters it, he produces the report, and he gets the accolades for it from his boss. So the data belongs to the accounting clerk, right? This way of thinking impedes data flow between individuals and departments and has a tendency of leading to redundant, disjoint data throughout the organization.

After the data design process, bridges are built between these isolated data islands created by users or departments so that the data can benefit the whole enterprise. It may also happen that fewer islands exist and redundant data is eliminated.

 

Data Modeling


Data is normally organized into an information system. This system can be compared to something as simple as a loose-sheet binder, however this book describes the data design process in the context of computer-based information systems, or databases. Moreover, databases follow a design model, and we will use the most popular one—the relational model.

The complete data collection of an enterprise is larger than what our model will encompass.

We will build a model that represents only a subset of the data spectrum. The question is which subset? We'll see in Chapter 2 that we must set boundaries to the analyzed system's data scope.

To build information systems that last, data must be tamed and molded to correctly represent reality. Correctly here means:

  • Follow the needs of the organization, including the system's boundaries

  • Conform to the chosen data design model (here, the relational one)

  • Possess a high degree of adaptability to adjust itself to the changing environment

Overview of the Relational Model

We owe to Dr. Edgar F. Codd the concept of the relational model, from his 1970 paper A Relational Model of Data for Large Shared Data Banks (http://www.acm.org/classics/nov95/toc.html). Dr. Codd later explained his model by defining a set of rules—the so-called Codd's Twelve rules (http://en.wikipedia.org/wiki/Codd%27s_12_rules). An ideal database management system (DBMS) would implement all those rules, but few if any do. But this is not a problem in practice since the benefits of the relational model are achieved even in products that do not apply all the rules. We are perfectly capable of building an efficient relational data design with currently available database products like MySQL.

When dealing with data design, I believe that the most important rules are number 1 and number 2. Here is a summary of these two Codd's rules.

Rule #1

This rule states that data is contained in tables. A table logically regroups information about a certain subject, for example, cars. The tabular format—rows and columns is the important idea here. A row describes information about a single item, for example, a specific car, whereas a column describes a single characteristic (or attribute) of each item, for example, its color. We will see in Chapter 3 that the decomposition of data into well-adjusted columns is important to have a flexible and useful structure.

The intersection of a row and a column contains the value of a specific attribute for a single item. We sometimes refer to this intersection as a cell containing our data—this is the same idea as in a spreadsheet.

Rule #2

Data is not retrieved or referenced by physical location—find the third record in this file. Instead, data must be fetched by referencing a table, a unique key—the primary key—and one or many column names. For example, with the cars table, we use the car serial number to retrieve this car's color.

This rule will be studied in Chapter 4, where we describe data grouping and the concept of choosing keys. Proper key choosing is of utmost importance.

Simplified Design Technique

Many years ago, I started to elaborate data structures using the relational model. I was using a method that could be summarized by this sentence: "determine where the data fits the best in the structure". Then I learned about the design techniques that were taught to IT specialists and evolved from the relational model.

The technique, which is frequently taught consists of building an entity-relationship diagram. In this kind of diagram, we represent nouns, for example, a car, a customer, using entities, and the relationships between them are expressed using verbs. An example of relationship binding two entities is "a customer buys a car". When the diagram is done, it must be somewhat transformed into a model consisting of tables and columns, using a technique called normalization that uses many steps to refine the model into an effective data structure.

These techniques produce reports, diagrams, and eventually a theoretical data design that can be implemented physically in a DBMS.

When I became familiar with those traditional techniques, I thought that for me at least they were a loss of time. Those methods teach a way but the ultimate goal—a working relational database and associated documentation can be achieved more directly. Moreover, those techniques suffer a problem: they cannot be applied blindfolded and mechanically. The developer always has to think about data naming, data grouping, and choosing keys while trying to balance users' needs and constraints imposed by:

  • the hardware

  • the chosen database management system

  • planned growth

  • time

  • budget

I realized that the traditional techniques are taught everywhere, and I respect the teachers who teach them. But believe me, when it's time to deliver an application notwithstanding the interface itself, it's important to avoid losing time to intermediate by-products and go straightforward to a working prototype. Using a more direct method during the data design phase frees more time to refine the interface, to catch unforeseen needs and address them.

This book's goal is to teach the minimum principles one has to apply in order to build an effective data structure.

 

Case Study


The various steps of data design can be explained in a very practical way by using two case studies. A case study is the best way of explaining ideas that can somewhat become too abstract without real examples. Chapters 1 through 5 are based on a single case study: "Car dealership". Chapter 6 consists of another case study that recapitulates all the notions seen in the previous chapters.

Our Car Dealer

Suppose we've been contacted by a car dealer who wants to computerize parts of his business. Let's describe a little bit about this business. In Chapter 2, we will examine the data collecting phase for our system more formally.

This car dealer operates at a single address. They employ nine salespersons who dutifully welcome potential customers and show them the car models that are available on the floor. In addition, two store assistants handle car movements, and an office clerk takes notes about customers' appointments. Fontax and Licorne are the two fictitious brands offered by this dealer. Each brand has a number of models, for example Mitsou, Wanderer, and Gazelle.

The System's Goals

We want to keep information about the cars' inventory and sales. The following are some sample questions that demonstrate the kind of information our system will have to deal with:

  • How many cars of Fontax Mitsou 2007 do we have in stock?

  • How many visitors test-drove the Wanderer last year?

  • How many Wanderer cars did we sell during a certain period?

  • Who is our best salesperson for Mitsou, Wanderer, or overall in 2007?

  • Are buyers mostly men or women (per car model)?

Here are the titles of some reports that are needed by this car dealer:

  • Detailed sales per month: salesperson, number of cars, revenue

  • Yearly sales per salesperson

  • Inventory efficiency: average delay for car delivery to the dealer, or to the customer

  • Visitors report: percentage of visitors trying a car; percentage of road tests that lead to a sale

  • Customer satisfaction about the salesperson

  • The sales contract

In addition to this, screen applications must be built to support the inventory and sales activities. For example, being able to consult and update the appointment schedule; consult the car delivery schedule for the next week.

After this data model is built, the remaining phases of the application development cycle, such as screen and report design, will provide this car dealer with reports, and on-line applications to manage the car inventory and the sales in a better way.

 

The Tale of the Too Wide Table


This book focuses on representing data in MySQL. The containers of tables in MySQL, and other products are the databases. It is quite possible to have just one table in a database and thus avoid fully applying the relational model concept in which tables are related to each other through common values; however we will use the model in its normal way: having many tables and creating relations between them.

Note

This section describes an example of data crammed into one huge table, also called a too wide table because it is formed with too many columns. This too wide table is fundamentally non-relational.

Sometimes the data structure needs to be reviewed or evaluated, as it might be based on poor decisions in terms of data naming conventions, key choosing, and the number of tables. Probably the most common problem is that the whole data is put into one big, wide table.

The reason for this common structure (or lack of structure) is that many developers think in terms of the results or even of the printed results. Maybe they know how to build a spreadsheet and try to apply spreadsheet principles to databases. Let's assume that the main goal of building a database is to produce this sales report, which shows how many cars were sold in each month, by each salesperson, describing the brand name, the car model number, and the name.

Salesperson

Period

Brand Name

Car model number

Car model name and year

Quantity sold

Murray, Dan

2006-01

Fontax

1A8

Mitsou 2007

3

Murray, Dan

2006-01

Fontax

2X12

Wanderer 2006

7

Murray, Dan

2006-02

Fontax

1A8

Mitsou 2007

4

Smith, Peter

2006-01

Fontax

1A8

Mitsou 2007

1

Smith, Peter

2006-01

Licorne

LKC

Gazelle 2007

1

Smith, Peter

2006-02

Licorne

LKC

Gazelle 2007

6

Without thinking much about the implications of this structure, we could build just one table, sales:

salesperson

brand

model_number

model_name_year

qty_2006_01

qty_2006_02

Murray, Dan

Fontax

1A8

Mitsou 2007

3

4

Murray, Dan

Fontax

2X12

Wanderer 2006

7

 

Smith, Peter

Fontax

1A8

Mitsou 2007

1

 

Smith, Peter

Licorne

LKC

Gazelle 2007

1

6

At first sight, we have tabularized all the information that is needed for the report.

Note

The book's examples can be reproduced using the mysql command-line utility, or phpMyAdmin, a more intuitive web interface. You can refer to Mastering phpMyAdmin 2.8 for Effective MySQL Management book from Packt Publishing (ISBN 1-904811-60-6). In phpMyAdmin, the exact commands may be typed in using the SQL Query Window, or we can benefit from the menus and graphical dialogs. Both ways will be shown throughout the book.

Here is the statement we would use to create the sales table with the mysql command-line utility:

CREATE TABLE sales (
salesperson char(40) NOT NULL,
brand char(40) NOT NULL,
model_number char(40) NOT NULL,
model_name_year char(40) NOT NULL,
qty_2006_01 int(11) NOT NULL,
qty_2006_02 int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

In the previous statement, while char(40) means a column with 40 characters, int(11) means an integer with a display width of 11 in MySQL.

Using the phpMyAdmin web interface instead, we would obtain:

Here we have entered sample data into our sales table:

INSERT INTO sales VALUES ('Murray, Dan', 'Fontax', '1A8', 'Mitsou 2007', 3, 4);
INSERT INTO sales VALUES ('Murray, Dan', 'Fontax', '2X12', 'Wanderer 2006', 7, 0);
INSERT INTO sales VALUES ('Smith, Peter', 'Licorne', 'LKC', 'Gazelle 2007', 1, 6);
INSERT INTO sales VALUES ('Smith, Peter', 'Fontax', '1A8', 'Mitsou 2007', 1, 0);

However this structure has many maintenance problems. For instance, where do we store the figures for March 2006? To discover some of the other problems, let's examine sample SQL statements we could use on this table to query about specific questions, followed by the results of those statements:

/* displays the maximum number of cars of a single model sold by each vendor in January 2006 */
SELECT salesperson, max(qty_2006_01)
FROM sales
GROUP BY salesperson
/* finds the average number of cars sold by our sales force taken as a whole, in February 2006 */
SELECT avg(qty_2006_02)
FROM sales
WHERE qty_2006_02 > 0
/* finds for which model more than three cars were sold in January */
SELECT model_name_year, SUM(qty_2006_01)
FROM sales
GROUP BY model_name_year
HAVING SUM(qty_2006_01) > 3

We notice that, although we got the answers we were looking for, with the above SQL queries, we would have to modify column names in the queries to obtain results for other months. Also, it becomes tricky if we want to know the month for which the sales have surpassed the yearly average, because we have to potentially deal with twelve column names. Another problem would arise when attempting to report for different years, or to compare a year with another one.

Moreover, a situation that could demonstrate the poor state of this structure is the need for a new report. A structure that is based too closely on a single report instead of being based on the intrinsic relations between data elements does not scale well and fails to accommodate future needs.

Chapter 4 will unfold those problems.

 

Summary


We saw that MySQL's popularity has put a powerful tool on the desktop of many users; some of them are not on par about design techniques. Data is an important resource and we have to think about the organization's data as a whole. The powerful relational model can help us for structuring activities. This book avoids specialized, academic vocabulary about the relational model, focusing instead on the important principles and the minimum tasks needed to produce a good structure. We then saw our main case study, and we noticed how it's unfortunately easy to build wide, inefficient tables.

About the Author

  • Marc Delisle

    Marc Delisle was awarded "MySQL Community Member of the year 2009" because of his involvement with phpMyAdmin. He started to contribute to the project in December 1998, when he made the multi-language version. He is still involved with phpMyAdmin as a developer and project administrator. Marc is a system administrator at Cegep de Sherbrooke, Québec, Canada. He has been teaching networking, security, and web application development. In one of his classes, he was pleased to meet a phpMyAdmin user from Argentina. Marc lives in Sherbrooke with his wife and they enjoy spending time with their four children.

    Browse publications by this author

Latest Reviews

(4 reviews total)
GREAT technical resources for a REALLY broad range of topics
problème de paiement
The book was very well written and easy to follow.
Book Title
Unlock this full book FREE 10 day trial
Start Free Trial