In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation.[1] [2] Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.
The set of all attributes is always a superkey (the trivial superkey). Tuples in a relation are by definition unique, with duplicates removed after each operation, so the set of all attributes is always uniquely valued for every tuple. A candidate key (or minimal superkey) is a superkey that can't be reduced to a simpler superkey by removing an attribute.[3]
For example, in an employee schema with attributes employeeID
, name
, job
, and departmentID
, if employeeID
values are unique then employeeID
combined with any or all of the other attributes can uniquely identify tuples in the table. Each combination,,,, and so on is a superkey. is a candidate key, since no subset of its attributes is also a superkey. is the trivial superkey.
If attribute set K is a superkey of relation R, then at all times it is the case that the projection of R over K has the same cardinality as R itself.
Monarch Name | Monarch Number | Royal House | |
---|---|---|---|
Edward | II | Plantagenet | |
Edward | III | Plantagenet | |
Richard | III | Plantagenet | |
Henry | IV | Lancaster |
First, list out all the sets of attributes:
•
•
•
•
•
•
•
•
Second, eliminate all the sets which do not meet superkey's requirement. For example, cannot be a superkey because for the same attribute values (Edward, Plantagenet), there are two distinct tuples:
Finally, after elimination, the remaining sets of attributes are the only possible superkeys in this example:
In reality, superkeys cannot be determined simply by examining one set of tuples in a relation. A superkey defines a functional dependency constraint of a relation schema which must hold for all possible instance relations of that relation schema.