The scale cube is a technology model that indicates three methods (or approaches) by which technology platforms may be scaled to meet increasing levels of demand upon the system in question. The three approaches defined by the model include scaling through replication or cloning (the “X axis”), scaling through segmentation along service boundaries or dissimilar components (the “Y axis”) and segmentation or partitioning along similar components (the “Z axis”).[1] [2] [3] [4] [5] [6]
The model was first published in a book in the first edition of The Art of Scalability.[7] The authors claim first publishing of the model online in 2007 in their company blog.[6] Subsequent versions of the model were published in the first edition of Scalability Rules in 2011,[8] the second edition of The Art of Scalability in 2015 [1] [4] and the second edition of Scalability Rules in 2016.[9]
The X axis of the model describes scaling a technology solution through multiple instances of the same component through cloning of a service or replication of a data set. Web and application servers performing the same function may exist behind a load balancer for scaling a solution. Data persistence systems such as a database may be replicated for higher transaction throughput.[1] The Y axis of the model describes scaling a technology solution by separating a monolithic application into services using action words (verbs), or separating “dissimilar” things. Data may be separated by nouns. Services should have the data upon which they act separated and isolated to that service.[1] [10] The Z axis of the cube describes scaling a technology solution by separating components along “similar” boundaries. Such separations may be done on a geographic basis, along customer identity numbers, etc.[1] [11]
X axis scaling is the most commonly used approach and tends to be the easiest to implement. Although potentially costly, the speed at which it can be implemented and start alleviating issues tends to offset the cost. The X Axis tends to be a simple copy of a service that is then load balanced to either help with spikes in traffic or server outages. The costs can start to become overwhelming, particularly when dealing with the persistence tier.[6]
Y axis scaling starts to break away chunks of monolithic code bases and creates separate services, or sometimes microservices.[12] This separation creates clearly defined lanes for not only responsibility and accountability, but also for fault isolation. If one service fails, it should only bring down itself and not other services.[6] [13]
Z axis scaling is usually looking at similar use cases of data. Whether that be geographic in nature or how customers use your website, or even just a random modulus of your customer dataset. The Z Axis breaks customers into sequestered sections to benefit response time and to help eliminate issues if a particular region or section should go down.[6] [14]