A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data.[1] A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for.
Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations. The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements.[2] This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules.
Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry. The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data.
Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness.
A data steward ensures that each assigned data element:
Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices.
Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization:
Systematic data stewardship can foster:
Assignment of each data element to a person sometimes seems like an unimportant process. But many groups have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element.
Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus.[4] Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website,[5] blog[6] and a YouTube channel.[7]
The https://web.archive.org/web/20061002020733/http://www.epa.gov/edr/EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact).
In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows.[8]
A new market for data governance applications is emerging, one in which both technical and business staff — stewards — manage policies. These new applications, like previous generations, deliver a strong business glossary capability, but they do not stop there. Vendors are introducing additional features addressing the roles of business in addition to technical stewards' concerns.[9]
Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases.
The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases.[10]
Some of the vendors in Metadata Management, like Alation, have started highlighting the importance of Data Stewards to employees interested in using data to make business decisions.[11]