Evolutionary database design involves incremental improvements to the database schema so that it can be continuously updated with changes, reflecting the customer's requirements. People across the globe work on the same piece of software at the same time hence, there is a need for techniques that allow a smooth evolution of database as the design develops. Such methods utilize automated refactoring and continuous integration so that it supports agile methodologies for software development. These development techniques are applied on systems that are in pre-production stage as well on systems that have already been released. These techniques not only cover relevant changes in the database schema according to customer's changing needs, but also migration of modified data into the database and also customizing the database access code accordingly without changing the data semantics.[1]
After using the waterfall model for a long time, the software industry has witnessed a rise in adoption of agile methods for software development. Agile methodologies don’t assume requirements to be permanent at any stage of the software life cycle. These methods are designed to support sporadic changes in contrast to waterfall design technique. An important part of this approach is iterative development, where the entire software life-cycle is run multiple times during the life of a project. Every iteration witnesses the complete software development life cycle despite the iterations being of short duration that can vary between weeks to a few months.
Before the adoption of these methodologies, the entire system was designed before starting to develop the code. The same principle was applied to the database schema as well where it was considered to be derived out of the software requirements which were in turn developed by collaboration between the customer, end-users, business analysts, etc. and these requirements were not expected to change with the progress in the software development. This approach proved to be cumbersome because as time progressed, the redundancies in the existing database schema in the form of unused rows or columns were evident. This redundancy along with data quality problems went on to become a costly affair. It was concluded that the practice of not having design interleaved with construction and testing was highly inefficient.
As mentioned in the previous section evolutionary methods are iterative in nature and these methods have become immensely popular over last two decades. Evolutionary database design aims to construct the database schema over the course of the project instead of building the entire database schema at the beginning of the project. This method of database design can capture and deal effectively with the changing requirements of projects.
There are five evolutionary database design techniques that can aid developers in building their database in an iterative fashion. A brief overview about the five techniques are provided below.
See main article: article and Database refactoring.
Refactoring is the process of making changes to the program without affecting the functionality of the program. Database refactoring is the technique of implementing small changes to the database schema without affecting the functionality and information stored in the database.[2] The main purpose of database refactoring is to improve the database design so that the database is more in-sync with the changing requirements. The user can modify tables, views, stored procedures and triggers. Dependency between the database and external applications make database refactoring a challenge.
Data modeling is the technique of identifying entities, associating attributes to the entities and deciding the data structure to represent the attributes. In the traditional database scenario, a logical data model is created at the beginning to represent the entities and their associated attributes. In evolutionary data modeling the technique of data modeling is performed in an iterative manner, that is multiple data models are developed, each model representing a different aspect of the database. This kind of data modeling technique is practiced in an agile environment and it is one of the main principles of agile development.[3]
Whenever a new functionality is added to a system, it is essential to verify that the update does not corrupt or render the system unusable. In a database, the business logic is implemented in stored procedures, data validation rules and referential integrity and they have to be tested thoroughly when any change is implemented in the system. Regression testing is the process of executing all the test cases whenever a new feature is added to the system. test-first development (TFD) is a form of regression testing followed in evolutionary database design. The steps involved in TFD approach are,[4]
Configuration management is a detailed recording of versions and updates that have been applied to any system. Configuration management is useful in rolling back updates and changes which have impacted the system in a negative manner. To ensure that any updates made in database refactoring can be rolled back, it is important to maintain database artifacts like data definition language scripts, data model files, reference data, stored procedures, etc. in a configuration management system.[5]
See main article: article and Sandbox (software development).
A sandbox is a fully functional environment in which the system can be built, tested and executed. In order to make changes to the database schema in an evolutionary manner it is ideal for every developer to have his/her own physical sandbox, copy of source code and a copy of database. In a sandbox environment the developer can make changes to the database schema and run tests without affecting the work of other developers and other environments. Once the change has been implemented successfully, it is promoted to pre-production environment where in acceptance testing is performed and after the acceptance tests succeed it is deployed into production.
Traditional database design technique does not support changes like evolutionary database design technique.'Unfortunately, the traditional data community assumed that evolving database schema is a hard thing to do and as a result never thought through how to do it.' In a way, the evolutionary design is better for application developers and traditional design is better for data professionals.[6]
Properties | Traditional Database Design | Evolutionary Database Design | |
---|---|---|---|
Design | Traditional databases were developed by collaboration between business analysts and users. | Evolutionary Databases were designed by software developers and data professionals. | |
Design Issues | They demonstrate some design issues in databases. Commercially available databases were developed by experienced individuals, but are now being serviced by the database and not data professionals. | They are developed by a close alliance of software developers and data professionals. The database evolves with the development and hence is processed by the same individuals who were responsible for development. | |
Approach towards change | Any change requested by the user is incorporated in the logical model followed by the physical model and then tested to ensure perfect functionality. | Change is an integral part of evolutionary database design. Any change requested by the user is immediately implemented in the database as well as the code. The data migration scripts need to be updated as well. | |
Dependency on ER diagram | Traditional design is methodological and due to its dependency on ER diagram and its detailed design phases such as user, logic, and physical we can track data as well as its meaning. | Design is interleaved between stages for evolutionary database design. Thus, the relationship between entities can change over software development cycle and so does the ER diagram. |
Given below are a list of tools that provide the functionality of designing and developing a database in an evolutionary manner.