Denormalization in DBMS
Overview
Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant (duplicate) data to one or more tables. This method can help us to avoid costly joins in a relational database made during normalization.
What is Denormalization?
Traditionally data is stored in normalized databases, in which multiple separate tables are maintained in a relational database to minimize the redundant data. Therefore, whenever we have to access data from multiple tables, we need to perform complex and costly join operations on the required tables.
Denormalization does not mean that the database is never normalized. It is just a method of optimizing databases that is implemented after normalization has been achieved.
Example
Here in the example, we have three tables Student table, Branch table and HOD table after normalization. The student table has roll_no, student_name, age, and branch_id as attributes.
roll_no | student_name | age | branch_id |
---|---|---|---|
101 | Ash | 18 | 1 |
102 | Gary | 16 | 3 |
103 | Corey | 19 | 4 |
104 | James | 23 | 2 |
105 | George | 20 | 3 |
Additionally, the Branch table has attributes such as branch_id, branch_name, total_students. The branch table is linked to the Student table via branch_id as the student table's foreign key.
branch_id | branch_name | total_students | hod_id |
---|---|---|---|
1 | CSE | 120 | 901 |
2 | ME | 90 | 902 |
3 | ECE | 60 | 903 |
4 | EEE | 30 | 904 |
And we also have HOD (Head of Department) table with attributes such as hod_id, hod_name, address. The HOD table is linked to the branch table via hod_id as the branch table's foreign key.
hod_id | hod_name | address |
---|---|---|
901 | Mr. Max | Munich |
902 | Mr. Tyson | Tokyo |
903 | Mr. Andrew | Chicago |
904 | Mr. Rick | Vancouver |
Suppose we want to retrieve all student names along with their branch name and hod name. For this to happen, a JOIN operation is required to join both student and branch tables and the result of this join is further joined with the hod table. This query will be a great option if the table is small but in case if the tables are big, joins on tables can take an excessively long time.
The SQL query to query the required data will be
Output
student_name | branch_name | hod_name |
---|---|---|
Ash | CSE | Mr. Max |
Gary | ECE | Mr. Andrew |
Corey | EEE | Mr. Rick |
James | ME | Mr. Tyson |
George | ECE | Mr. Andrew |
So to counter the above situation, we can denormalize the database with redundant data and extra effort to maximize the efficiency benefits of fewer joins. Therefore, we can add the branch name, hod's name data from the Branch and HOD table respectively to the student table to optimize the database.
After denormalization, the table will look like
roll_no | student_name | age | branch_id | branch_name | hod_name |
---|---|---|---|---|---|
101 | Ash | 18 | 1 | CSE | Mr. Max |
102 | Gary | 16 | 3 | ECE | Mr. Andrew |
103 | Corey | 19 | 4 | EEE | Mr. Rick |
104 | James | 23 | 2 | ME | Mr. Tyson |
105 | George | 20 | 3 | ECE | Mr. Andrew |
Pros of Denormalization
-
Enhanced Query Performance Querying data from normalized databases requires joining multiple tables which slows down the query performance. To overcome this, redundancy is added to the database thus minimizing the number of joins.
-
More convenient to manage databases In a normalized database, required values are calculated while query execution takes a longer time thus slowing down the execution of the query. Thus to counter this, denormalization is done to fetch queries that can be simpler because we need to look at fewer tables.
-
Facilitate and accelerate reporting In normalized databases, if we want to regularly monitor revenues over a certain period, Generating such reports from live data will require "searching" throughout the entire database thus significantly slowing it down.
Cons of Denormalization
- It takes a lot of storage space due to storing redundant data.
- It is very expensive to maintain updates and inserts in a denormalized database, also code for insert and update is harder to write in this case.
- Data Integrity is not maintained in denormalized databases, as there is redundancy so data is inconsistent.
How is Denormalization Different from Normalization?
Denormalization is a method to merge data from multiple relational tables into a single table in a database that can be retrieved quickly. While Normalization, on the other hand, is used to delete redundant/duplicate data from a database and replace it with non-redundant and reliable data.
Denormalization is used when joins are costly, and queries are run regularly on the tables. Normalization, on the other hand, is typically used when a large number of insert/update/delete operations are performed, and joins between those tables are not expensive.
Conclusion
- Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant data to one or more tables.
- The data from one table is included in another table to reduce the number of joins in the query and hence helps in speeding up the performance.
- In database systems that require scalability, like that of any big tech company, we always use both normalized and denormalized databases.