The standard column family is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key–value pair being a "row".[1] Each column is a tuple (triplet) consisting of a column name, a value, and a timestamp.[2] In a relational database table, this data would be grouped together within a table with other non-related data.[3]
Standard column families are column containers sorted by their names can be referenced and sorted by their row key.[4]
Accessing the data in a distributed data store would be expensive (time-consuming), if it would be saved in form of a table. It would also be inefficient to read all column families that would make up a row in a relational table and put it together to form a row, as the data for it is distributed on a large number of nodes. Therefore, the user accesses only the related information required.
As an example, a relational table could consist of the columns UID, first name, surname, birthdate, gender, etc. In a distributed data store, the same table would be implemented by creating columns families for "UID, first name, surname", "birthdate, gender", etc. If one needs only the males that were born between 1950 and 1960, for a query in the relational database, all the table has to be read. In a distributed data store, it suffices to access only the second standard column family, as the rest of information is irrelevant.
There is no way to sort columns, nor to query an arbitrary query in distributed data stores. Columns are sorted when they are added to the column family. The way of sorting is defined by an attribute. For instance, this is done by the CompareWith
attribute in Apache Cassandra that can have the following values:
AsciiType
BytesType
LexicalUUIDType
LongType
TimeUUIDType
UTF8Type
It is also possible to add some user-defined sorting attributes. Using this way of sorting makes the process extremely quick.[5]
Standard column families have a schema-less nature so that each of their "row"s can contain a different number of columns, and even different column names could be in each row.[6] So, they are a very different concept than the rows in relational database management system (RDBMS)s. This is one of the reasons why the concept is not trivial for an experienced RDBMS expert.
In JSON-like notation, a column family definition would look as follows:[6]
where "Cassandra", "TerryCho", "Cath" correspond to row keys; and "emailAddress", "age", "gender", "address" correspond to the column names.