Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data.[1] Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.
Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence.[2] More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.
Data ethics is concerned with the following principles:[3]
Ownership of data involves determining rights and duties over property, such as the ability to exercise individual control over (including limit the sharing of) personal data comprising one's digital identity. The question of data ownership arises when someone records observations on an individual person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership and intellectual property.[4]
In the European Union, some people argue that the General Data Protection Regulation indicates that individuals own their personal data, although this is contested.[5]
Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression.[6]
In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms.
Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors.[7] This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned.[8]
The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports.
Privacy has been presented as a limitation to data usage which could also be considered unethical.[9] For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs. This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data. However, it is possible to extract the value of data without compromising privacy.
Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists. In a 2014 article for the Wake Forest Law Review, King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information. In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date.[10] According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age.[11] In the United States, citizens have the right to delete voluntarily submitted data. This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted. While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age.[12]
The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach.[13]
This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit.[14]
The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability.[15] This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves.[16] King and Richards have argued that this call for transparency includes a tension between openness and secrecy.
Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate.[17] To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency.
Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open.[18] OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments, based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets.[19]
Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials.[20]