In relational database theory, a functional dependency is the following constraint between two attribute sets in a relation: Given a relation R and attribute sets
X,Y\subseteqR
\PiX,YR
In other words, a dependency FD: X → Y means that the values of Y are determined by the values of X. Two tuples sharing the same values of X will necessarily have the same values of Y.
The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. A simple application of functional dependencies is Heath's theorem; it says that a relation R over an attribute set U and satisfying a functional dependency X → Y can be safely split in two relations having the lossless-join decomposition property, namely into
\PiXY(R)\bowtie\PiXZ(R)=R
A notion of logical implication is defined for functional dependencies in the following way: a set of functional dependencies
\Sigma
\Gamma
\Sigma
\Gamma
\Sigma\models\Gamma
Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) On the other hand, EngineCapacity → VIN is incorrect because there could be many vehicles with the same engine capacity.
This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would not result in a normalized relation.
This example illustrates the concept of functional dependency. The situation modelled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID.
Student ID | Semester | Lecture | TA | |
---|---|---|---|---|
1234 | 6 | Numerical Methods | John | |
1221 | 4 | Numerical Methods | Smith | |
1234 | 6 | Visual Computing | Bob | |
1201 | 2 | Numerical Methods | Peter | |
1201 | 2 | Physics II | Simon |
We notice that whenever two rows in this table feature the same StudentID,they also necessarily have the same Semester values. This basic factcan be expressed by a functional dependency:
If a row was added where the student had a different value of semester, then the functional dependency FD would no longer exist. This means that the FD is implied by the data as it is possible to have values that would invalidate the FD.
Other nontrivial functional dependencies can be identified, for example:
The latter expresses the fact that the set is a superkey of the relation.
A classic example of functional dependency is the employee department model.
Employee ID | Employee name | Department ID | Department name | |
---|---|---|---|---|
0001 | John Doe | 1 | Human Resources | |
0002 | Jane Doe | 2 | Marketing | |
0003 | John Smith | 1 | Human Resources | |
0004 | Jane Goodall | 3 | Sales |
This case represents an example where multiple functional dependencies are embedded in a single representation of data. Note that because an employee can only be a member of one department, the unique ID of that employee determines the department.
In addition to this relationship, the table also has a functional dependency through a non-key attribute
This example demonstrates that even though there exists a FD Employee ID → Department ID - the employee ID would not be a logical key for determination of the department Name. The process of normalization of the data would recognize all FDs and allow the designer to construct tables and relationships that are more logical based on the data.
See main article: article and Armstrong's axioms. Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are the following, usually called Armstrong's axioms:[3]
"Reflexivity" can be weakened to just
X → \varnothing
\vdashX → \varnothing
X → Y\vdashXZ → YZ
X → Y,Y → Z\vdashX → Z
These three rules are a sound and complete axiomatization of functional dependencies. This axiomatization is sometimes described as finite because the number of inference rules is finite, with the caveat that the axiom and rules of inference are all schemata, meaning that the X, Y and Z range over all ground terms (attribute sets).[4]
By applying augmentation and transitivity, one can derive two additional rules:
One can also derive the union and decomposition rules from Armstrong's axioms:[3] [6]
X → Y and X → Z if and only if X → YZ
The closure of a set of values is the set of attributes that can be determined using its functional dependencies for a given relationship. One uses Armstrong's axioms to provide a proof - i.e. reflexivity, augmentation, transitivity.
Given
R
F
R
F
R
F
F
Closure of a set of attributes X with respect to
F
F
Imagine the following list of FDs. We are going to calculate a closure for A (written as A+) from this relationship.
The closure would be as follows:Therefore, A+= ABCD. Because A+ includes every attribute in the relationship, it is a superkey.
Definition:
F
G
G
F
F
G
G
F
Two sets of FDs
F
G
R
F
G
F
G
F
G
F
G
A set
F
F'
F
F'
F
F'
F
F
G
F
G
F
F
F
F
\models
F
F
F
\models
An important property (yielding an immediate application) of functional dependencies is that if R is a relation with columns named from some set of attributes U and R satisfies some functional dependency X → Y then
R=\PiXY(R)\bowtie\PiXZ(R)
\PiXY(R)\bowtie\PiXZ(R)
Heath's theorem effectively says we can pull out the values of Y from the big relation R and store them into one,
\PiXY(R)
\PiXZ(R)
Functional dependencies however should not be confused with inclusion dependencies, which are the formalism for foreign keys; even though they are used for normalization, functional dependencies express constraints over one relation (schema), whereas inclusion dependencies express constraints between relation schemas in a database schema. Furthermore, the two notions do not even intersect in the classification of dependencies: functional dependencies are equality-generating dependencies whereas inclusion dependencies are tuple-generating dependencies. Enforcing referential constraints after relation schema decomposition (normalization) requires a new formalism, i.e. inclusion dependencies. In the decomposition resulting from Heath's theorem, there is nothing preventing the insertion of tuples in
\PiXZ(R)
\PiXY(R)
Normal forms are database normalization levels which determine the "goodness" of a table. Generally, the third normal form is considered to be a "good" standard for a relational database.
Normalization aims to free the database from update, insertion and deletion anomalies. It also ensures that when a new value is introduced into the relation, it has minimal effect on the database, and thus minimal effect on the applications using the database.
A set S of functional dependencies is irreducible if the set has the following three properties:
Sets of functional dependencies with these properties are also called canonical or minimal. Finding such a set S of functional dependencies which is equivalent to some input set S' provided as input is called finding a minimal cover of S': this problem can be solved in polynomial time.[9]