Database structuring: the normalization

In the previous article we treated the important concept of  the primary key. Today we are going to see in theory  what are the main measures  (which are called normal forms) to be implemented in the database to make them more efficient and maintainable and to ensure data consistency. Instead in the last article of this series we’ll see how to apply these rules.

What’s normalization?

Before we begin let’s see a definition of normalization:

Normalization is a process of elimination of redundancy  and the risk of inconsistency from the database. There are several levels of normalization ( normal forms) that certify the quality of the database schema. This process is based on a simple criterion: if a report has  more than one independent concepts, the relationship breaks down in smaller ones, one for each concept. This type of process is not always applicable to all the tables, since in some cases this can cause a loss of information. (Source: Wikipedia)

What do you mean for redundancy?

Redundancy is a sworn enemy of the databases and we must avoid it like the plague. Imagine if the table that contains the articles of this blog was like this:

Now, when Maurizio has written 1000 articles (he’s nearly there), there will be 2000 useless data in this database.

It’s clear that the data of the author (name, e-mail address, etc…) will feature in the table authors. To put in relationship an article to its author, all one needs to do is post in the table articles the primary key that leads to the author.

The first normal form (1NF)

A database is defined in first normal form when:

  • Each column refers to atomic data

This means that each column must contain non dividable values.                                                                                                                                                 Therefore, a name_and _surname column would not permit the table to be in first normal form. The consequences of this error are clear. I couldn’t, for example, do a specific research by surname  or by name. Or I couldn’t put the data in order by name or surname.

The non atomicity of data can have more complicated forms, imagine for example a database containing the list of  students and the courses they enrolled to.

I only used the student’s name, but consider that the table would contain all the personal data. Now, this table is not in first normal form. The courses  the student has enrolled to aren’t  shown in an atomic form. To avoid this problem, anyone would be attempted to do like shown in the image below:

But this wouldn’t be a good solution. Every new course introduced oblige us to add a new column to the table with disastrous effects on the maintainability. Moreover this is an example of  fat table, this term  describes the tendency (wrong) to want to create one single table that contains data that don’t have a logical connection.

Instead logic tells us that personal data go in one table, while the list of the courses goes in another one. To describe which courses a student enrolled to, we’re going to create another table that will put in relationship students with courses. We’ll see how in the next article.

The second normal form (2NF)

A database is defined in second normal form when:

  • it is in first normal form;
  • the non key fields depend on the whole primary key and not on part of it.

On a part of it? Yes, the second normal form is applied to tables with compound keys, so we can say that non key field cannot depend on a part of the fields that form the primary key (we can implicitly consider a table with a single primary key as in second normal form because it is impossible that it violates the principles).

Let’s do an example. We take in consideration a sport event where the athletes  do three competitions. We can give the following description in our database:

In this table we can use the primary key formed of the code_athlete/number_competition.
But it’s not in second normal form.
In fact the non key field name_ athlete depends on a part of the primary key code_athlete and not on the whole key.
To put this table in second normal form we need to break it in two tables.

The third normal form (3NF)

A database is defined in third normal form when:

  • it’s in second normal form (and therefore implicitly even in first);
  • the non key fields do not depend on other non key fields.

It’s much more simpler than it seems. Practically it means that if a data can be calculated/ deduced/rebuilt from another data, you don’t need to put it in the table. Let’s do an example of a database of  product orders:

The field price depends on the fields unit_price and  quantity and it’s calculable multiplying these last two fields. Therefore the field price is unnecessary and must be eliminated. Another simple example could be the database of an e-commerce. In our store we want to indicate the prices in euro and in dollars for all our products. Now, it would be wrong a table like this:

The price in dollars  depends in fact on the price in euro (or vice versa). Therefore a column gets eliminated. This will make the database more efficient and rational. Otherwise we need to modify periodically all the values to update the change.

We can calculate the price in dollars at the extraction of the data,  perhaps using the API of an on-line quotation website, having this way the change always updated.

Conclusion

At this point you have at your disposal theoretic basis with which you can create a database correctly set up. In the next article we’ll see in practice how to establish logic relations between the tables in order to obtain  the best efficiency of the database in terms of  reliability, maintainability and rationality.

Master per Web Designer Freelance
In tutti questi anni abbiamo ricevuto centinaia di richieste di approfondimento sulle numerose tematiche del web design vissuto da freelance. Le abbiamo affrontate volta per volta. Ma ci siamo resi conto che era necessario fare qualcosa di più. Ecco perché è nato One Year Together, un vero e proprio master per web designer freelance che apre finalmente le porte al mondo del lavoro.
Scopri One Year Together »
[pdf]Scarica articolo in PDF[/pdf]
Tags:

The Author

Maurizio is married to the triad PHP - MySql - Apache and, not enough, he has a lover called jQuery. He has a blog where he tries to describe in detail all of "his lovers". His real specialty is the realization of large business application, altough he never refuses the commitment of a website.

Author's web site | Other articles written by

Related Posts

You may be interested in the following articles:

4 comments

Trackback e pingback

No trackback or pingback available for this article

Leave a Reply

Current month ye@r day *