Database structuring (1/3): The primary key
We start today with a series of articles on database structuring. Throughout years of militancy in various support forums, I have realized that there are many users who are perfectly able to manipulate databases, even with very complex queries, but when they seek help for some reason and you see the database table structure, it may happen that you find yourself in front of confused ones and which do not take into account some simple and basic rules.
This article is thus intended for those who are newbies in the world of databases and want to get off on the right foot, or even those who have a certain experience and want to further improve the quality of their products.
The principles we are going to deal with are fundamental for the creation of efficient, maintainable databases and which adhere to the programming standards. The topics we will deal with are the following:
- Primary key
- First, second and third normal form
- The relational concept and the application of normalization
Once assimilated these concepts, it will be easier for you to create database structures correctly.
Let’s start then with the first topic of fundamental importance since almost everything revolves around the latter.
What’s the primary key?
The primary key in a table is that datum which helps to uniquely identify each row (thus a set of data) and which by convention is named id.
A table containing addresses for example, can have multiple rows in which the first name is “Maurizio” and the surname is “Tarchini”. In fact, there can be various people whose name is “Maurizio Tarchini”. The primary key enables to distinguish every single record independently from ambiguities that can be easily created (such as people having the same first and last name).
As you can see in this table the first name, last name, and city fields are not univocal. The name Maurizio appears twice and so does the city Catania. We have to create a datum thus that is entirely unique, in a few words a primary key.
The presence of a primary key correctly set in a table is an indispensable precondition to the application of relationality (concept we will cover in another article).
If all this is missing, there is a high risk it will lead to the phenomenon of inconsistency, which in a few words means unreliability of data (we will see why and how).
How to set the primary key?
Normally the primary key is set in the table creation phase.
The most classical case is to set the field as a whole number, declare it as primary key and attribute to it the auto_increment property.
`id` INT( 6 ) UNSIGNED NOT NULL AUTO_INCREMENT , PRIMARY KEY ( `id` )
In this way we leave to the database the task of managing primary keys. This is a very secure method.
The database will automatically attribute a univocal number to every record. Even in case we canceled a row, the primary key of this row will no longer be used afterwards.
But this path, at times, could prove to be erroneous.
Only a single univocal datum is required
It often happens that you get so accustomed to setting a primary key to not even realize that sometimes it already exists without you having to create it. What follows is a very common error.
A fundamental rule of databases is (as it will turn out in other occasions as well) that we shouldn’t have redundant data. In this case we have a redundant datum though.
In fact, makes no sense creating the id field as univocal datum. The secure code is already, due to its nature, a univocal datum.
We can thus use this datum as a primary key. In this case, however, we obviously don’t have to define the auto_increment property; the datum will be inserted as it is, for us the certainty it is a univocal datum is enough. We can therefore do without the id column.
Similarly we could use the matriculation number as primary key for a list of students of a particular university or the VAT number for a list of businesses.
Could the univocal datum be a combination of data?
If we really need to say it all, not necessarily the primary key has to be a single datum. If the combination of two or more data can univocally identify a row, we can use what is defined as a composite primary key.
Let’s take for example a table which displays the province, the plate number and the vehicle owner.
The combination between province and plate number gives a univocal datum and thus the possible primary key for this table.
In this article we started to explore the techniques thanks to which it’s possible to create efficient and maintainable databases. Therefore we briefly analyzed what a primary key is. We will have the chance to see how important it is in future articles. The primary key is in fact the basis for carrying out the normalization (which we are going to deal with in the next article) and the normalization in turn is the basis for the implementation of a strong relationality between tables (topic of the last article of this series).
L'immagine principale dell'articolo è stata fornita da @Fotolia