The social data revolution is the shift in human communication patterns towards increased personal information sharing and its related implications, made possible by the rise of social networks in the early 2000s. This phenomenon has resulted in the accumulation of unprecedented amounts of public data.[1]
This large and frequently updated data source has been described as a new type of scientific instrument for the social sciences.[2] Several independent researchers have used social data to "nowcast" and forecast trends such as unemployment, flu outbreaks,[3] mood of whole populations,[4] travel spending and political opinions in a way that is faster, more accurate and cheaper than standard government reports or Gallup polls.[2]
Social data refers to data individuals create that is knowingly and voluntarily shared by them. Cost and overhead previously rendered this semi-public form of communication unfeasible, but advances in social networking technology from 2004–2010 has made broader concepts of sharing possible.[5] The types of data users are sharing include geolocation, medical data,[6] dating preferences, open thoughts, interesting news articles, etc.
The social data revolution enables not only new business models like the ones on Amazon.com but also provides large opportunities to improve decision-making for public policy and international development.[7]
The analysis of large amounts of social data leads to the field of computational social science. Classic examples include the study of media content[8] or social media content.[9]
Every internet activity leaves behind traces of data (a digital footprint) which can be used to learn more about the user. As use of the internet is becoming more widespread, the datafication of the world is progressing rapidly: Currently, around 16 zettabytes of data are produced per year and for the year 2025 163 zettabytes of data are expected.[10] This has led to data becoming a critical commodity. This ties together all societal actors: Public institutions, private firms, as well as individuals, each relying on data in a unique way.
Governments have been collecting data for centuries to ensure the continuance of institutional systems, through limiting the risk of defaulting credits, collecting tax based on income and providing the necessary infrastructure under consideration of their citizens' demographic distribution.[11] In its beginnings, this data entailed written information for record keeping and control, including a census system.
This analogue process was very time- and cost-intensive, leaving little room for interpreting larger data sets. Meanwhile, corporate technological developments have moved this offline data into the digital age, allowing visualization and data analytics.[12] In the public sphere, connecting the survey and poll methodologies with database computing, resulted in the ability to gather and store large data sets on individuals.
Over the last few decades, the internet has shifted from being used mostly as a source of information about the world to being primarily used for communication, user-generated content, data sharing, and community building.[13] This is what many consider to be the development of "Web 2.0" social network sites such as Facebook and YouTube are the foundation of the development of Web 2.0 and the shift to social data sharing.
Early examples of social data websites are Craigslist and the wishlists of Amazon.com. Both enable users to communicate information to anybody who is looking for it. They differ in their approach to identity. Craigslist leverages the power of anonymity, while Amazon.com leverages the power of persistent identity, based on the history of the customer with the firm. The job market is even being shaped by the information people share about themselves on sites like LinkedIn and Facebook.[14]
Examples of more sophisticated social data sites are Twitter and Facebook. On Twitter, sending a message or tweet is as simple as sending an SMS text message. Twitter made this C2W, customer to the world: Any tweet a user sends can potentially be read by the entire world. Facebook focuses on interactions between friends, C2C in traditional language. It provides many ways for collecting data from its users: "tag" a friend in a photo, "comment" on what they posted, or just "like" it. These data are the basis for sophisticated models of the relationships between users. They can be used to significantly increase the relevance of what is shown to the user, and for advertising purposes.[15]
By 2009, the popularity of social networking sites had increased to four times of what it had been in 2005.[16] As of 2013, Twitter has over 250 million users sharing almost 500 million tweets per day, and Facebook has well over one billion users around the world.[17]
Companies often use the data that is shared via social networking sites and other forms of data sharing avenues, advertisers, etc.[18] Social networking sites, for example, can sell user data to advertisers and other entities which they can then influence consumer decisions. Data mining is also used to gather this information.
While websites and other applications were the origins of this data collection, with improvements in technology, many devices that are used in daily life have the ability to collect data on individuals and therefore are increasing the amount of personal data that is available (ex. smartphones, tech watches, music devices, etc.).[19] [20]
This growth of people's digital identity – the information available via these electronic sources- is being used by companies and organizations to improve products and services and to reduce costs by targeting what consumers want/expect. The data that can be gathered can include shopping experiences, social media preferences, demographic information and more.
Using this data can allow for better personalization of products and has become an expected and vital aspect of product use and production. The data that is accessible about consumers can be used to infer behavioral patterns of consumers.[21] For example, location information is used to assess when and where consumers are going to target ads and promotions based on what stores consumers are going to. Online retailers also have gained insight as to how better personalize the online shopping experience through data gathered during the online transaction.[22]
Businesses can even use consumer data to determine whether different shelf spacing of products has an effect on consumer purchasing decisions as well as assess potential cross-item marketing potentials based on items often purchased together.[23]
While businesses and advertisers often take advantage of the consumer data available, consumers also use other users' information for their purchase decisions. Social commerce sites are where consumers share product/service experiences and opinions and other information.[24] A famous example of such a site is Pinterest which has over 100 million users. These sites and other online sources of product/brand information are influential on consumer's purchasing decisions.[25] It is estimated that about 67% of online customers use this information in making their purchase decisions. These sites create an environment that is considered trusted by consumers since the information is coming from other consumers.
With the vast amount of data available about individuals that are accessible, the potential uses of this information are growing.
The healthcare sector has many potential uses for this data. Information gathered from social media, and other social data sharing sources can be used to predict the flu, disease outbreaks, how emergency responses are handled, and more.[26] With the use of Twitter and geotags, medical researchers can evaluate the health of a particular neighborhood and use that information to provide better outreach and services. Medtronic has developed a digital blood glucose meter that allows health care providers and patients know about low levels.
Social data can also be used to assess reactions to crises.[27] After Hurricane Sandy, researchers used Twitter to evaluate the emotions and issues that those affected were facing. This information can potentially be used to help better prepare and respond to future crises.
This data can be used to assist with urban planning. The city of Boston has used rider information from Uber to improve transportation planning and road maintenance.
See main article: Computational social science. Using social data for research purposes has led to the development of computational social science. Computational social science combines social science, computer science, and network science.[28] This field emerged in 2009.[29] Before the rise of social data and the technological advances that supported it, researchers were limited to a narrow view of information based on individuals since their primary form of research relied on interviews. With the vast amount of social data available today, researchers can now analyze a wider group and can obtain a broader view of information. They can use social networks, cell phone data, and perform online experiments that allow them to gather more information than before.
With the amount of data available about individuals accessible by many sources, privacy has become a major concern. Security breaches of customer and other social information such as the compromise of more than 56 million Home Depot customers' credit card information have impacted the concern of privacy with social data. How companies are using, and the potential misuse of the personal information gathered is a concern for the majority of consumers. Despite this, many people do not know how social networking sites and other sources are using and selling their data.[30] In 2014 study, only 25% of online users knew that their location could be accessed and only 14% knew that their web-surfing history could be accessed and shared.
Even though privacy concern is a critical factor in people's sharing of personal information on the internet and overall internet involvement, most people are willing to share this information if the benefits of doing so outweigh the potential privacy and security costs. Consumers enjoy the personalization of products and services that are possible because of this information gathering and despite the concerns, continue to use them.
In his study of the data revolution in international development, Social Sciences Professor at UC Davis, Martin Hilbert, argued that the natural next step from information societies, fueled by ICT, since the late 1990s are knowledge societies informed by Big Data analysis. Decision-making informed by big data analysis has improved both efficiency and productivity in the developed world. Hilbert examines the challenges and potential of the data revolution on "the unruly world of international development."
Hilbert identified four types of data available in large quantities by 2013: words, locations, nature, and behavior.
Individual interactions with the internet, such as words in comments, social media postings, and Google search term volumes, offer an increasingly large source of big data. Typically statistics are generated through a census or a probability survey, for example, the Annual Social and Economic Supplement (ASEC), Current Population Survey (CPS), American Community Survey (ACS), National Health Interview Survey (NHIS) in the United States or administrative records, such as payroll, unemployment, Social Security income taxes, scanner data and credit card data and other commercial transaction records.
Weatherhead University Professor Gary King described how the revolution is not just regarding the quantity of data available but in the ability to do something with the data to benefit society.
Global Positioning System (GPS)-enabled mobile tablets, phones, Radio-frequency identification (RFID) chips (part of Automatic identification and data capture (AIDC) technologies), telematics, Location-based games, etc. provide data on absolute location and relative movement.
Hilbert categorizes data on natural processes under 'Nature' which includes sensors that provide data on moisture in the air and temperature.
Data can be generated from user-behavior in multiplayer online games, such as League of Legends, World of Warcraft, Minecraft, Call of Duty, and Dota 2. Nathan Eagle's, a computer scientist at the Santa Fe Institute in New Mexico, began using cellphones in the early 2000s to collect accurate, large-scale data about real social interactions.[31] [32] [33] The project was named one of the "10 Technologies Most Likely To Change The Way We Live" by the MIT Technology Review.[34]