Here’s a complicated subject that will make your brain go into knots. When you look at normal forms, it makes you want to give up. But once you’ve mastered them, you’ll wonder how you ever did without it. Do not respect normal forms and you will lead to long-term difficulties. Normal forms allow you to evaluate the quality level and the durability of a data model (no less). If you’ve been modeling data for a long time, it probably won’t change the way you work but it is a tool that will allow you to pass a milestone in your professionalism because it will allow you to justify your modeling choices. There are five and a half normal forms, and it is generally recommended to respect at least the third form. In this article, we will look at the first normal form. We will see in another article the following ones.
Usefulness of the Normal Forms
When I produced my first data models, I always had a significant amount of uncertainty in the quality of my deliverables. Normal you may say. It is with experience and especially through data quality audits that I realized the cost of a design default in a database. Regularly, backtracking is done via projects that cost millions of euros. By respecting the normal forms and some denormalization rules you can validate the durability of your model.
Today, I see a lot of people who underestimate the importance of modeling data correctly by considering that it is a technical detail, and they tend to improvise on this activity. To challenge their model, it is long, it consumes a lot of energy and as the arguments used are from past experiences, it is perceived as an opinion and therefore easily rejected. Since normal forms are a norm, they can be used as a procedure to be respected, and thus remove the personal dimension of the debate. You can then criticize a model in a factual way, without it looking like an opinion, which is a source of conflict.
Another point is that often when you deliver a data model, it is sometimes difficult to explain how you did it. Normal forms are a great tool to give visibility and justification to your models.
First normal form (1NF)
The first normal form is relatively simple, it consists in describing the information in a unitary format. But now that we know it, we are not much further ahead. I propose you to detail it by a series of concrete examples.
Case of unstructured fields
The simplest example to understand this normal form is the comment field. Users regularly request a comment field “just in case” they have other information to enter that they do not yet know. Given the costs of developing an application, this is understandable. In general, over time, users begin to use this field to enter information by putting a structure such as
- Product weight: 13g
- Place of transit: Paris
- …
In general, after a while, this structure is no longer used or structural errors appear. Result: it is more and more complex to recover the information. We will find for example variations such as:
- Product weight: 13g
- Product weight – 13g
- Productweight: 13g
- Weight: 13g
The first weak point of this kind of modeling is that it is complex to request this kind of modeling. If you want to request it, each time you find a variation, it is an additional piece of code to implement to retrieve the information. In the end, there are so many different formats that we make a small list (sometimes big), we give it to the users, and they go through them one by one to clean them up.
Case of multi valued fields
I have very rarely come across this case, in general, a database that contains it has been badly designed, it is about multi valued fields in which we put a list of information (and therefore a relationship). For example, a list of cities, the information is entered as follows:
- Availability: Paris, Toulouse, Lyon
We will find the same difficulties in this type of modeling as in the previous one. Even if with some controls at the time of the seizure one can get out of it, this type of modeling presents little interest and makes the requesting rather complex.
Case of non modeled lists
This kind of case happens more often than multi valued fields, it consists in not modeling a list of values and let the user enter what he wants in a field. For example, on a list of countries, instead of putting a field with a drop-down list that allows the user to select the country, we put a text field in which the user enters the country as he wants.
This design default quickly leads to input errors. Indeed, the users will sometimes put capital letters, make mistakes, or fill in another false one (for example: XXX).
Case of non atomic information
This last case consists in putting several information in a single field. The most common example is the addresses in which we ask for the street number, the street name, the postal code and the city in a single field.
This causes the same problems as with comment fields. In the long run, the information becomes difficult to use.
Conclusion for first Normal Form
That’s enough for the first normal form, these are fairly basic modeling principles. There are a few cases where these rules do not have to be respected but in general, there is little point in not doing so. It’s easy to implement so why not! The following normal forms are a bit more complicated to explain, we’ll see that in another article.
One thought on “The First Normal Form (1NF)”