Decomposition in DBMS
Decomposition in DBMS involves dividing a table into multiple tables, aiming to eradicate redundancy, inconsistencies, and anomalies. This process, represented as {X1, X2,……Xn}, ensures dependency preservation and losslessness. When a relational model's relation lacks appropriate normal form, decomposition becomes necessary to address issues like information loss, anomalies, and redundancy, ultimately enhancing the overall design quality and efficiency of the database.
There are two types of decomposition as shown below:
Rules for Decomposition
Whenever we decompose a relation, there are certain properties that must be satisfied to ensure no information is lost while decomposing the relations. These properties are:
- Lossless Join Decomposition.
- Dependency Preserving.
Lossless Join Decomposition
A lossless Join decomposition ensures two things:
- No information is lost while decomposing from the original relation.
- If we join back the sub decomposed relations, the same relation that was decomposed is obtained.
We can follow certain rules to ensure that the decomposition is a lossless join decomposition Let’s say we have a relation R and we decomposed it into R1 and R2, then the rules are:
-
The union of attributes of both the sub relations R1 and R2 must contain all the attributes of original relation R.
R1 ∪ R2 = R
-
The intersection of attributes of both the sub relations R1 and R2 must not be null, i.e., there should be some attributes that are present in both R1 and R2.
R1 ∩ R2 ≠ ∅
-
The intersection of attributes of both the sub relations R1 and R2 must be the superkey of R1 or R2, or both R1 and R2.
R1 ∩ R2 = Super key of R1 or R2
Let’s see an example of a lossless join decomposition. Suppose we have the following relation EmployeeProjectDetail as:
<EmployeeProjectDetail>
Employee_Code | Employee_Name | Employee_Email | Project_Name | Project_ID |
---|---|---|---|---|
101 | John | john@demo.com | Project103 | P03 |
101 | John | john@demo.com | Project101 | P01 |
102 | Ryan | ryan@example.com | Project102 | P02 |
103 | Stephanie | stephanie@abc.com | Project102 | P02 |
Now, we decompose this relation into EmployeeProject and ProjectDetail relations as:
<EmployeeProject>
Employee_Code | Project_ID | Employee_Name | Employee_Email |
---|---|---|---|
101 | P03 | John | john@demo.com |
101 | P01 | John | john@demo.com |
102 | P04 | Ryan | ryan@example.com |
103 | P02 | Stephanie | stephanie@abc.com |
The primary key of the above relation is {Employee_Code, Project_ID}.
<ProjectDetail>
Project_ID | Project_Name |
---|---|
P03 | Project103 |
P01 | Project101 |
P04 | Project104 |
P02 | Project102 |
The primary key of the above relation is {Project_ID}.
Now, let’s see if this is a lossless join decomposition by evaluating the rules discussed above:
Let’s first check the EmployeeProject ∪ ProjectDetail:
<EmployeeProject ∪ ProjectDetail>
Employee_Code | Project_ID | Employee_Name | Employee_Email | Project_Name |
---|---|---|---|---|
101 | P03 | John | john@demo.com | Project103 |
101 | P01 | John | john@demo.com | Project101 |
102 | P04 | Ryan | ryan@example.com | Project104 |
103 | P02 | Stephanie | stephanie@abc.com | Project102 |
As we can see all the attributes of EmployeeProject and ProjectDetail are in EmployeeProject ∪ ProjectDetail relation and it is the same as the original relation. So the first condition holds.
Now let’s check the EmployeeProject ∩ ProjectDetail:
<EmployeeProject ∩ ProjectDetail>
Project_ID |
---|
P03 |
P01 |
P04 |
P02 |
As we can see this is not null, so the the second condition holds as well. Also the EmployeeProject ∩ ProjectDetail = Project_Id. This is the super key of the ProjectDetail relation, so the third condition holds as well.
Now, since all three conditions hold for our decomposition, this is a lossless join decomposition.
Lossless vs Lossy Decomposition
In a lossy decomposition, one or more of these conditions would fail and we will not be able to recover Complete information as present in the original relation. For example, let's say we decompose our original relation EmployeeProjectDetail into EmployeeProject and ProjectDetail relations as:
<EmployeeProject>
Employee_Code | Employee_Name | Employee_Email |
---|---|---|
101 | John | john@demo.com |
102 | Ryan | ryan@example.com |
103 | Stephanie | stephanie@abc.com |
The primary key of the above relation is {Employee_Code}.
<ProjectDetail>
Project_ID | Project_Name |
---|---|
P03 | Project103 |
P01 | Project101 |
P04 | Project104 |
P02 | Project102 |
The primary key of the above relation is {Project_ID}.
Now, the intersection EmployeeProject ∩ ProjectDetail is null. Therefore there is no way for us to map a project to its employees. Thus this is a lossy decomposition.
Dependency Preserving
The second property of lossless decomposition is dependency preservation which says that after decomposing a relation R into R1 and R2, all dependencies of the original relation R must be present either in R1 or R2 or they must be derivable using the combination of functional dependencies present in R1 and R2.
Let’s understand this from the same example above:
<EmployeeProjectDetail>
Employee_Code | Employee_Name | Employee_Email | Project_Name | Project_ID |
---|---|---|---|---|
101 | John | john@demo.com | Project103 | P03 |
101 | John | john@demo.com | Project101 | P01 |
102 | Ryan | ryan@example.com | Project104 | P04 |
103 | Stephanie | stephanie@abc.com | Project102 | P02 |
In this relation we have the following FDs:
- Employee_Code -> {Employee_Name, Employee_Email}
- Project_ID - > Project_Name
Now, after decomposing the relation into EmployeeProject and ProjectDetail as:
<EmployeeProject>
Employee_Code | Project_ID | Employee_Name | Employee_Email |
---|---|---|---|
101 | P03 | John | john@demo.com |
101 | P01 | John | john@demo.com |
102 | P04 | Ryan | ryan@example.com |
103 | P02 | Stephanie | stephanie@abc.com |
In this relation we have the following FDs:
- Employee_Code -> {Employee_Name, Employee_Email}
<ProjectDetail>
Project_ID | Project_Name |
---|---|
P03 | Project103 |
P01 | Project101 |
P04 | Project104 |
P02 | Project102 |
In this relation we have the following FDs:
- Project_ID - > Project_Name
As we can see that all FDs in EmployeeProjectDetail are either part of the EmployeeProject or the ProjectDetail, So this decomposition is dependency preserving.
Conclusion
- Decomposition is the process of breaking an original relation into multiple sub relations.
- Decomposition helps to remove anomalies, redundancy, and other problems in a DBMS.
- Decomposition can be lossy or lossless.
- An ideal decomposition should be lossless join decomposition and dependency preserving.