Normalization in DBMS
Overview
Normalization is the process of organizing the data and the attributes of a database. It is performed to reduce the data redundancy in a database and to ensure that data is stored logically. Data redundancy in DBMS means having the same data but at multiple places. It is necessary to remove data redundancy because it causes anomalies in a database which makes it very hard for a database administrator to maintain it.
Why Do We Need Normalization?
As we have discussed above, normalization is used to reduce data redundancy. It provides a method to remove the following anomalies from the database and bring it to a more consistent state:
A database anomaly is a flaw in the database that occurs because of poor planning and redundancy.
-
Insertion anomalies: This occurs when we are not able to insert data into a database because some attributes may be missing at the time of insertion.
-
Updation anomalies: This occurs when the same data items are repeated with the same values and are not linked to each other.
-
Deletion anomalies: This occurs when deleting one part of the data deletes the other necessary information from the database.
In the next section, we’ll see the types of normal forms in the database.
Normal Forms
There are four types of normal forms that are usually used in relational databases as you can see in the following figure:
-
1NF: A relation is in 1NF if all its attributes have an atomic value.
-
2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the candidate key in DBMS.
-
3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
-
BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS is the super key.
To understand the above-mentioned normal forms, we first need to have an understanding of the functional dependencies.
Functional dependency is a relationship that exists between two sets of attributes of a relational table where one set of attributes can determine the value of the other set of attributes. It is denoted by X -> Y, where X is called a determinant and Y is called dependent.
There are various levels of normalizations. Let’s go through them one by one:
First Normal Form (1NF)
A relation is in 1NF if every attribute is a single-valued attribute or it does not contain any multi-valued or composite attribute, i.e., every attribute is an atomic attribute. If there is a composite or multi-valued attribute, it violates the 1NF. To solve this, we can create a new row for each of the values of the multi-valued attribute to convert the table into the 1NF.
Let’s take an example of a relational table <EmployeeDetail> that contains the details of the employees of the company.
<EmployeeDetail>
Employee Code | Employee Name | Employee Phone Number |
---|---|---|
101 | John | 98765623,998234123 |
101 | John | 89023467 |
102 | Ryan | 76213908 |
103 | Stephanie | 98132452 |
Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.
To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as shown below:
<EmployeeDetail>
Employee Code | Employee Name | Employee Phone Number |
---|---|---|
101 | John | 998234123 |
101 | John | 98765623 |
101 | John | 89023467 |
102 | Ryan | 76213908 |
103 | Stephanie | 98132452 |
Second Normal Form (2NF)
The normalization of 1NF relations to 2NF involves the elimination of partial dependencies. A partial dependency in DBMS exists when any non-prime attributes, i.e., an attribute not a part of the candidate key, is not fully functionally dependent on one of the candidate keys.
For a relational table to be in second normal form, it must satisfy the following rules:
- The table must be in first normal form.
- It must not contain any partial dependency, i.e., all non-prime attributes are fully functionally dependent on the primary key.
If a partial dependency exists, we can divide the table to remove the partially dependent attributes and move them to some other table where they fit in well.
Let us take an example of the following <EmployeeProjectDetail> table to understand what is partial dependency and how to normalize the table to the second normal form:
<EmployeeProjectDetail>
Employee Code | Project ID | Employee Name | Project Name |
---|---|---|---|
101 | P03 | John | Project103 |
101 | P01 | John | Project101 |
102 | P04 | Ryan | Project104 |
103 | P02 | Stephanie | Project102 |
In the above table, the prime attributes of the table are Employee Code and Project ID. We have partial dependencies in this table because Employee Name can be determined by Employee Code and Project Name can be determined by Project ID. Thus, the above relational table violates the rule of 2NF.
The prime attributes in DBMS are those which are part of one or more candidate keys.
To remove partial dependencies from this table and normalize it into second normal form, we can decompose the <EmployeeProjectDetail> table into the following three tables:
<EmployeeDetail>
Employee Code | Employee Name |
---|---|
101 | John |
101 | John |
102 | Ryan |
103 | Stephanie |
<EmployeeProject>
Employee Code | Project ID |
---|---|
101 | P03 |
101 | P01 |
102 | P04 |
103 | P02 |
<ProjectDetail>
Project ID | Project Name |
---|---|
P03 | Project103 |
P01 | Project101 |
P04 | Project104 |
P02 | Project102 |
Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into <EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is fully dependent on the primary key.
The relations in 2NF are clearly less redundant than relations in 1NF. However, the decomposed relations may still suffer from one or more anomalies due to the transitive dependency. We will remove the transitive dependencies in the Third Normal Form.
Third Normal Form (3NF)
The normalization of 2NF relations to 3NF involves the elimination of transitive dependencies in DBMS.
A functional dependency X -> Z is said to be transitive if the following three functional dependencies hold:
- X -> Y
- Y does not -> X
- Y -> Z
For a relational table to be in third normal form, it must satisfy the following rules:
- The table must be in the second normal form.
- No non-prime attribute is transitively dependent on the primary key.
- For each functional dependency X -> Z at least one of the following conditions hold:
- X is a super key of the table.
- Z is a prime attribute of the table.
If a transitive dependency exists, we can divide the table to remove the transitively dependent attributes and place them to a new table along with a copy of the determinant.
Let us take an example of the following <EmployeeDetail> table to understand what is transitive dependency and how to normalize the table to the third normal form:
<EmployeeDetail>
Employee Code | Employee Name | Employee Zipcode | Employee City |
---|---|---|---|
101 | John | 110033 | Model Town |
101 | John | 110044 | Badarpur |
102 | Ryan | 110028 | Naraina |
103 | Stephanie | 110064 | Hari Nagar |
The above table is not in 3NF because it has Employee Code -> Employee City transitive dependency because:
- Employee Code -> Employee Zipcode
- Employee Zipcode -> Employee City
Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.
To remove transitive dependency from this table and normalize it into the third normal form, we can decompose the <EmployeeDetail> table into the following two tables:
<EmployeeDetail>
Employee Code | Employee Name | Employee Zipcode |
---|---|---|
101 | John | 110033 |
101 | John | 110044 |
102 | Ryan | 110028 |
103 | Stephanie | 110064 |
<EmployeeLocation>
Employee Zipcode | Employee City |
---|---|
110033 | Model Town |
110044 | Badarpur |
110028 | Naraina |
110064 | Hari Nagar |
Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into <EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t have any transitive dependency.
The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and remove redundancy caused by that. However, there may still exist some dependencies that cause redundancy in the database. These redundancies are removed by a more strict normal form known as BCNF.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains additional constraints compared to 3NF.
For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:
- The table must be in the third normal form.
- For every non-trivial functional dependency X -> Y, X is the superkey of the table. That means X cannot be a non-prime attribute if Y is a prime attribute.
A superkey is a set of one or more attributes that can uniquely identify a row in a database table.
Let us take an example of the following <EmployeeProjectLead> table to understand how to normalize the table to the BCNF:
<EmployeeProjectLead>
Employee Code | Project ID | Project Leader |
---|---|---|
101 | P03 | Grey |
101 | P01 | Christian |
102 | P04 | Hudson |
103 | P02 | Petro |
The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:
<EmployeeProject>
Employee Code | Project ID |
---|---|
101 | P03 |
101 | P01 |
102 | P04 |
103 | P02 |
<ProjectLead>
Project Leader | Project ID |
---|---|
Grey | P03 |
Christian | P01 |
Hudson | P04 |
Petro | P02 |
Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into <EmployeeProject> and <ProjectLead> tables.
Conclusion
- Normal forms are a mechanism to remove redundancy and optimize database storage.
- In 1NF, we check for atomicity of the attributes of a relation.
- In 2NF, we check for partial dependencies in a relation.
- In 3NF, we check for transitive dependencies in a relation.
- In BCNF, we check for the superkeys in LHS of all functional dependencies.