Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X
\twoheadrightarrow
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X
\twoheadrightarrow
A trivial multivalued dependency X
\twoheadrightarrow
A functional dependency is a special case of multivalued dependency. In a functional dependency X → Y, every x determines exactly one y, never more than one.
Consider the following example:
A1 Pizza | Thick Crust | Springfield | |
A1 Pizza | Thick Crust | Shelbyville | |
A1 Pizza | Thick Crust | Capital City | |
A1 Pizza | Stuffed Crust | Springfield | |
A1 Pizza | Stuffed Crust | Shelbyville | |
A1 Pizza | Stuffed Crust | Capital City | |
Elite Pizza | Thin Crust | Capital City | |
Elite Pizza | Stuffed Crust | Capital City | |
Vincenzo's Pizza | Thick Crust | Springfield | |
Vincenzo's Pizza | Thick Crust | Shelbyville | |
Vincenzo's Pizza | Thin Crust | Springfield | |
Vincenzo's Pizza | Thin Crust | Shelbyville |
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only candidate key is . Therefore, it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area (i.e. a restaurant offers all pizza varieties it makes to all areas it supplies), then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the attribute (which is not a superkey). The dependencies are:
\twoheadrightarrow
\twoheadrightarrow
These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency
\twoheadrightarrow
To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
A1 Pizza | Thick Crust | |
A1 Pizza | Stuffed Crust | |
Elite Pizza | Thin Crust | |
Elite Pizza | Stuffed Crust | |
Vincenzo's Pizza | Thick Crust | |
Vincenzo's Pizza | Thin Crust |
A1 Pizza | Springfield | |
A1 Pizza | Shelbyville | |
A1 Pizza | Capital City | |
Elite Pizza | Capital City | |
Vincenzo's Pizza | Springfield | |
Vincenzo's Pizza | Shelbyville |
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated that it is always possible to achieve 4NF.[2] Rissanen's theorem is also applicable on multivalued dependencies.
A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms.[3]
Only in rare situations does a 4NF table not conform to the higher normal form 5NF. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table.
\twoheadrightarrow