Candidate Key in DBMS
Overview
A candidate key is a subset of the super key that can uniquely identify the other attributes of the table. A table can have more than one candidate key. The candidate key helps in determining the prime and non-prime attributes of a table and ensures the integrity of the data by preventing duplicate data. Hence, the candidate key is an important concept for designing of the schema of a database.
What is a Candidate Key?
Before discussing the candidate key, we should first learn about the super-set of candidate key, i.e., super key in dbms.
Super key is nothing but a set of keys that can uniquely identify the tuples or rows of a table. So, all the keys present in the super key super-set are capable of uniquely identifying all the other attributes or columns of the table.
The image below depicts the basic hierarchy of the keys in DBMS.
Refer:Keys is DBMS
Note: Tuple is one record or one row of a database table.
A candidate key is a part of the super key only. We can define a candidate key as a combination of attributes of a table that can uniquely identify other attributes of the table. A candidate key is also known as a minimal super key.
As we know, the candidate key is chosen from the set of super keys. Among the super keys set, the key which does not contain any redundant attribute is chosen as the candidate key. Let us learn about what is a redundant attribute in the key by taking an example.
Suppose a table has 5 columns or attributes namely- A, B, C, D, E.
And the given super keys of the relation is: {ABC, AB, DE}
Since ABC is a super key, we can say that ABC can identify all other attributes of the table, i.e., ABC can identify A, B, C, D, E. Similarly, AB can also identify A, B, C, D, E; and DE can also identify A, B, C, D, E.
Now, among the super keys, the candidate key can be only AB and DE. The reason why we have not considered ABC to be a candidate key is that ABC contains redundant attributes. ABC has redundant attributes because only AB is capable of identifying all the other attributes of the table so we do not need the C attribute.
Note: Candidate keys are chosen from super keys and one of these candidate keys will further become Primary Key. The primary key selection is done by the Data Base Administrator according to the frequency of queries.
Why do we Need a Candidate Key?
A primary key is selected from the set of candidate keys by the Data Base Administrator (DBA) according to the incoming queries of the users. Hence, candidate keys are quite an important aspect of database management. We can also say that a candidate key means the same thing as a primary key.
The candidate key of a relation is all the possible ways through which we can identify a row or a tuple in DBMS. Hence, the candidate key is an important concept for the design of the schema of a database.
The candidate key also helps in determining prime and non-prime attributes. The columns present in a candidate key are known as prime attributes In DBMS, and the columns that are not present in any candidate key are called non-prime attributes.
The main aim or purpose of using the candidate key is that a candidate key ensures that the data inserted in the database should not affect the integrity of the stored data. The candidate key makes sure that there is no insertion of duplicate data. A table can have more than one candidate key.
The candidate key is also known as the minimal super key but we should note one thing, minimal does not mean the smallest. Minimal simply means non-reducible. Hence, if any attribute is removed from the candidate key, then the candidate key will not be able to uniquely identify other attributes of the table.
Example:
In the image above, we see that the attributes namely- StudID, Roll No, and Email are unique in nature so they are the candidate keys.
How is the Candidate Key Different from the Primary Key?
The purpose of both the candidate key and the primary key is the same, that is to uniquely identify the tuples or rows of a table. But there is some difference between the candidate key and the primary key as well. Let us discuss the differences in detail.
-
The most important and basic difference is that any attribute of a candidate key can have null values. But the attributes of a primary key cannot contain null values.
-
In a table, we can have one or more than one candidate key, but there is one and only one primary key in a relation. However, if there is only one candidate key in a table, then that candidate key is considered as the primary key constraint.
-
There may or may not be a primary key present in a specific relation but without a candidate key, a relation cannot be specified.
-
A primary key is a candidate key but a candidate key may or may not be the primary key.
Properties of Candidate Key
- Candidate key can uniquely identify all the other attributes of a table.
- The candidate key should not have redundant attributes.
- Candidate keys are a subset of the super key. Hence, a candidate key is a super key but vice versa is not correct.
- The attributes of a candidate key can contain null values.
- A Primary key is determined from the set of candidate keys by the Data Base Administrator.
- The candidate key helps in determining the prime and non-prime attributes of a table.
- The candidate key ensures the integrity of the data by preventing duplicate data.
- The candidate key must be unique.
Example of Candidate Key
Let’s take the example of the "Student" table to understand the candidate keys in a better way.
Student table
Roll_No | ID | Name |
---|---|---|
1 | CS13 | Aman |
2 | CS24 | Aditya |
3 | CS184 | Sushant |
4 | CS109 | Mohit |
5 | CS136 | Saumya |
The Student table has 3 attributes namely- Roll_No, ID, and Name. Now, as we can see, ID and Roll_Number will contain unique values only. But, the Name column can contain duplicate values.
Now, let us try to select candidate keys from the super keys of the table using what we have learned so far.
List of super keys:
- Roll_No
- ID
- Roll_No, ID
- Roll_No, Name
- Roll_No, ID, Name
- ID, Name
Note: Name attribute or column cannot be super key as Name can contain duplicate values. Also, only the Name attribute cannot identify the other attributes of the table.
Super keys = {{Roll_No}, {ID}, {Roll_No, ID}, {Roll_No, Name}, {Roll_No, ID, Name}, {ID, Name}}
Let us select the candidate keys from the above superset of super keys.
-
Roll_No: Roll_No is a candidate key as it can find all the rest attributes, i.e., ID, and Name of the table. Also, {Roll_No} does not contain any redundant attributes.
-
ID: ID is a candidate key as it can find all the rest attributes, i.e., Roll_No, and Name of the table. Also, {ID} does not contain any redundant attributes.
-
Roll_No, ID: {Roll_No, ID} contains redundant attributes as either Roll_No or ID can uniquely identify the other attributes of the table. Hence, {Roll_No, ID} is not a candidate Key.
-
Roll_No, Name: {Roll_No, Name} also contains redundant attribute, i.e., Name as only Roll_No can uniquely identify the other attributes of the table. Hence, {Roll_No, Name} is not a candidate Key.
-
Roll_No, ID, Name : {Roll_No, ID, Name} also contains redundant attribute. Hence, {Roll_No, ID, Name} is not a candidate Key.
-
ID, Name: {ID, Name} also contains redundant attributes, i.e., Name as only ID can uniquely identify the other attributes of the table. Hence, {ID, Name} is not a candidate Key.
Candidate keys = {{ID}, {Roll_No}}.
Note: Primary Key can be either ID or Roll_No.
One thing that the Data Base Administrator can think of is that when a student is enrolled, the student is given an ID but his/her Roll Number can be given after some time. So, he can choose ID over the Roll_no attribute.
Conclusion
- Candidate key is a part of the super key that can uniquely identify other attributes of the table. Hence, the candidate key is also known as a minimal super key.
- The candidate key should not have redundant attributes.
- The candidate key of a relation is all the possible ways through which we can identify a row or a tuple.
- The columns present in a candidate key are known as prime attributes, and the columns that are not present in any candidate key are called non-prime attributes.
- The purpose of using the candidate key is that a candidate key ensures the integrity of the data by preventing duplicate data.
- A Primary key is determined from the set of candidate keys by the Data Base Administrator.