BCNF in DBMS
BCNF (Boyce-Codd Normal Form) in DBMS, introduced by R.F. Boyce and E.F. Codd in the 1970s, is a normalization technique that eliminates table redundancy and anomalies for enhanced data integrity. While 2NF and 3NF address certain dependencies, BCNF addresses additional constraints that can persist, causing redundancy even in 3NF relations. Despite 3NF's adequacy, BCNF offers a more robust solution by specifically addressing cases where 3NF falls short in eliminating 100% redundancy due to certain functional dependencies.
What is BCNF in DBMS?
BCNF (Boyce Codd Normal Form) is an advanced version of the third normal form (3NF), and often, it is also known as the 3.5 Normal Form. 3NF doesn't remove 100% redundancy in the cases where for a functional dependency (say, A->B), A is not the candidate key of the table. To deal with such situations, BCNF was introduced.
BCNF is based on functional dependencies, and all the candidate keys of the relation are taken into consideration. BCNF is stricter than 3NF and has some additional constraints along with the general definition of 3NF.
A table or relation is said to be in BCNF in DBMS if the table or the relation is already in 3NF, and also, for every functional dependency (let's say, X->Y), X is either the super key or the candidate key. In simple terms, for any case (let's say, X->Y), X can't be a non-prime attribute.
To find the highest normalization form of any relation R with functional dependencies, we first need to check whether the relation is in BCNF or not. If relation R is found to be in BCNF, it simply means that the relation R is also in 3NF, 2NF, and 1NF as the hierarchy shown in the above image.
Similarly, if the relation is found to be in 3NF, it is also in 2NF and 1NF. The 3NF in DBMS has more restrictions and strict constraints than the first two normal forms, but it is less strict than the BCNF. This shows that the restriction always increases as we traverse down the hierarchy.
Rules for BCNF in DBMS
A table or relation is said to be in BCNF (Boyce Codd Normal Form) if it satisfies the following two conditions that we have already studied in its definition:
- It should satisfy all the conditions of the Third Normal Form (3NF).
- For any functional dependency (A->B), A should be either the super key or the candidate key. In simple words, it means that A can't be a non-prime attribute if B is given as a prime attribute.
Examples
Example 1:
In this example, we have to find the highest normalization form, and for that, we are given a relation R(A, B, C, D, E) with functional dependencies as follows: { BC->D, AC->BE, B->E }
- As we can see, (AC)+={A, C, B, E, D} and also, none of its subsets can determine all the attributes of the relation. There is another point to be noted that A or C can’t be derived from any other attribute of the relation, and therefore, there is only one candidate key, {AC}.
- Prime attributes in DBMS are always part of the candidate keys, and for this relation R, prime attributes are: {A, C} while non-prime attributes are: {B, E, D}.
- Clearly, there is no multi-valued attribute in the relation R, and hence, it is at least in 1NF.
- BC->D is in 2NF because BC is not a proper subset of the candidate key AC. AC->BE is also in 2NF because AC itself is a candidate key, and lastly, B->E is again in 2NF. For 2NF, there must not be any partial dependency present in the table, and hence, relation R here is in 2NF.
- The relation R is not in 3NF because BC->D at the start is not in 3NF (BC is not a candidate key, and also, D is not a prime attribute). Hence, the relation R has 2NF as the highest normalization form.
Example 2:
In this example, we have to again find the highest normalization form, and for that, we are given a relation R(A, B, C) with functional dependencies as follows: {AB ->C, C ->B, AB ->B} Candidate Key (given): {AB}
- Clearly, prime attributes for Relation R are: {A,B} while non-prime attributes are: {C}.
- For this particular example, let us start from the order of hierarchy with higher restrictions, and firstly, we will check for BCNF here.
- Clearly, {AB->C} and {AB->B} are in BCNF because AB is the candidate key present on the LHS of both dependencies. The second dependency, {C->B}, however, is not in BCNF because C is neither a super key nor a candidate key.
- C->B is, however, present in 3NF because B is a prime attribute that satisfies the conditions of 3NF. Hence, relation R has 3NF as the highest normalization form.
Example 3:
In this example, we have a relation R with three columns: Id, Subject, and Professor. We have to find the highest normalization form, and also, if it is not in BCNF, we have to decompose it to satisfy the conditions of BCNF.
Id | Subject | Professor |
---|---|---|
101 | Java | Mayank |
101 | C++ | Kartik |
102 | Java | Sarthak |
103 | C# | Lakshay |
104 | Java | Mayank |
Interpreting the table:
- One student can enroll in more than one subject.
- Example: student with Id 101 has enrolled in Java and C++.
- A professor is assigned to the student for a specified subject, and there is always a possibility that there can be multiple professors teaching a particular subject.
Finding the solution:
- Using Id and Subject together, we can find all unique records and also the other columns of the table. Hence, the Id and Subject together form the primary key.
- The table is in 1NF because all the values inside a column are atomic and of the same domain.
- We can't uniquely identify a record solely with the help of either the Id or the Subject name. As there is no partial dependency, the table is also in 2NF.
- There is no transitive dependency because the non-prime attribute i.e., Professor, is not deriving any other non-prime attribute column in the table. Hence, the table is also in 3NF.
- There is a point to be noted that the table is not in BCNF (Boyce-Codd Normal Form).
Why is the table not in BCNF?
As we know each professor teaches only one subject, but one subject may be taught by multiple professors. This shows that there is a dependency between the subject & the professor, and the subject is always dependent on the professor (professor -> subject). As we know the professor column is a non-prime attribute, while the subject is a prime attribute. This is not allowed in BCNF in DBMS. For BCNF, the deriving attribute (professor here) must be a prime attribute.
How to satisfy BCNF?
In Example 3, we will decompose the table into two tables: the Student table and the Professor table to satisfy the conditions of BCNF.
Student Table
P_Id | S_Id | Professor |
---|---|---|
1 | 101 | Mayank |
2 | 101 | Kartik |
3 | 102 | Sarthak |
4 | 103 | Lakshay |
5 | 104 | Mayank |
Professor Table
Professor | Subject |
---|---|
Mayank | Java |
Kartik | C++ |
Sarthak | Java |
Lakshay | C# |
Mayank | Java |
Professor is now the primary key and the prime attribute column, deriving the subject column. Hence, it is in BCNF.
Example 4:
Let's take another general example to understand the concept of decomposition in detail: We have a relation R(A, B, C, D) that is already in 3NF. Candidate Keys: {A, BC} Prime Attributes: {A, B, C} Non-Prime Attributes: {D}
Functional dependencies are as follows: {A->BCD, BC->AD, D->B}
The above relation is not in BCNF because {D->B} is not in BCNF as {D} is neither a candidate key nor a prime attribute. Hence, we will decompose the relation R into R1{A, D, C} and R2{D, B}.
Conclusion
- BCNF (Boyce Codd Normal Form) is an advanced version of the third normal form (3NF), and often, it is also known as the 3.5 Normal Form.
- A relation is said to be in BCNF in DBMS if the relation is already in 3NF, and also, for every functional dependency (say, X->Y), X is either the super key or the candidate key. In simple terms, X can't be a non-prime attribute.
- If ant relation R is found to be in BCNF, it simply means that the relation R is also in 3NF, 2NF, and 1NF.
- The 3NF in DBMS has more restrictions and strict constraints than the first two normal forms, but it is less strict than the BCNF.
- This shows that the restriction always increases as we traverse down the hierarchy.