A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.
A candidate key is a minimal superkey,[1] i.e., a superkey that doesn't contain a smaller one. Therefore, a relation can have multiple candidate keys, each with a different number of attributes.[2]
Specific candidate keys are sometimes called primary keys, secondary keys or alternate keys.The columns in a candidate key are called prime attributes,[3] and a column that does not occur in any candidate key is called a non-prime attribute.
Every relation without NULL values will have at least one candidate key: Since there cannot be duplicate rows, the set of all columns is a superkey, and if that isn't minimal, some subset of that will be minimal.
There is a functional dependency from the candidate key to all the attributes in the relation.
The superkeys of a relation are all the possible ways we can identify a row. The candidate keys are the minimal subsets of each superkey and as such, they are an important concept for the design of database schema.
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation variable (relvar) R with attributes (A, B, C, D) that has only the following two legal values r1 and r2:
A | B | C | D | |
---|---|---|---|---|
a1 | b1 | c1 | d1 | |
a1 | b2 | c2 | d1 | |
a2 | b1 | c2 | d1 |
A | B | C | D | |
---|---|---|---|---|
a1 | b1 | c1 | d1 | |
a1 | b2 | c2 | d1 | |
a1 | b1 | c2 | d2 |
Here r2 differs from r1 only in the A and D values of the last tuple.
For r1 the following sets have the uniqueness property, i.e., there are no two distinct tuples in the instance with the same attribute values in the set:
,,,,,,, For r2 the uniqueness property holds for the following sets;
,,,,,,, Since superkeys of a relvar are those sets of attributes that have the uniqueness property for all legal values of that relvar and because we assume that r1 and r2 are all the legal values that R can take, we can determine the set of superkeys of R by taking the intersection of the two lists:
,,,,, Finally we need to select those sets for which there is no proper subset in the list, which are in this case:
,, These are indeed the candidate keys of relvar R.
We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of attributes is a candidate key. For example, if we had considered only r1 then we would have concluded that is a candidate key, which is incorrect. However, we might be able to conclude from such a relation that a certain set is not a candidate key, because that set does not have the uniqueness property (example for r1). Note that the existence of a proper subset of a set that has the uniqueness property cannot in general be used as evidence that the superset is not a candidate key. In particular, note that in the case of an empty relation, every subset of the heading has the uniqueness property, including the empty set.
The set of all candidate keys can be computede.g. from the set of functional dependencies.To this end we need to define the attribute closure
\alpha+
\alpha
\alpha+
\alpha
It is quite simple to find a single candidate key.We start with a set
\alpha
minimize(\alpha)
\alpha
minimize(\alpha)
Actually we can detect every candidate key with this procedureby simply trying every possible order of removing attributes.However there are many more permutations of attributes (
n!
2n
There is a fundamental difficulty for efficient algorithms for candidate key computation:Certain sets of functional dependencies lead to exponentially many candidate keys.Consider the
2 ⋅ n
\{Ai → Bi:i\in\{1,...,n\}\}\cup\{Bi → Ai:i\in\{1,...,n\}\}
2n
\{A1,B1\} x ... x \{An,Bn\}
The following algorithm actually runs in polynomial time in the number of candidate keys and functional dependencies:[4]
function find_candidate_keys(A, F) /* A is the set of all attributes and F is the set of functional dependencies */ K[0] := minimize(A); n := 1; /* Number of Keys known so far */ i := 0; /* Currently processed key */ while i < n do for each α → β ∈ F do /* Build a new potential key from the previous known key and the current FD */ S := α ∪ (K[i] − β); /* Search whether the new potential key is part of the already known keys */ found := false; for j := 0 to n-1 do if K[j] ⊆ S then found := true; /* If not, add it */ if not found then K[n] := minimize(S); n := n + 1; i := i + 1 return K
The idea behind the algorithm is that given a candidate key
Ki
\alpha → \beta
\alpha\cup(Ki\setminus\beta)
. Christopher J. Date. An Introduction to Database Systems. Addison-Wesley. 978-0-321-18956-1. 2003. 268–276. 5: Integrity.