Data Colada is a blog dedicated to investigative analysis and replication of academic research, focusing in particular on the validity of findings in the social sciences.[1]
It is known for its advocacy against problematic research practices such as p-hacking, and for publishing evidence of data manipulation and research misconduct in several prominent cases, including celebrity professors Dan Ariely and Francesca Gino. Data Colada was established in 2013 by three behavioral science researchers: Uri Simonsohn, a professor at ESADE Business School, Barcelona/Spain (as of 2023), Leif Nelson, a professor at University of California, Berkeley, and Joe Simmons, a professor at University of Pennsylvania.
Around 2011, Simmons, Nelson and Simonsohn "bonded over the false, ridiculous, and flashy findings that the field [of behavioral sciences] was capable of producing", such as a paper by Cornell psychologist Daryl Bem that had supposedly found evidence for clairvoyance.[2] They reacted by publishing an influential 2011 paper about false positive results in psychology, illustrating the problem with a parody research finding that supposedly showed that listening to the Beatles song "When I’m Sixty-Four" made experimental subjects one and a half years younger.
The "Data Colada" blog was launched two years later, in 2013, carrying the tagline "Thinking about evidence, and vice versa", becoming what the New York Times described as "a hub for nerdy discussions of statistical methods — and, before long, various research crimes and misdemeanors".
In particular, the three researchers objected to the then widespread practice of cherry-picking data and attempts to make insignificant results appear statistically credible, especially an approach for which they coined the term p-hacking in a 2014 paper.[3] [4]
Apart from calling out faulty, but presumably well-intended research practices, Data Colada also published evidence of data manipulations and research misconduct. These include studies about the concept of the moral high ground by psychologist Lawrence Sanna, and research by Flemish psychologist Dirk Smeesters. According to The New Yorker, after Data Colada published their work, the careers of Sanna and Smeesters "came to an unceremonious end".
In 2021, Data Colada discovered fabricated data in a 2012 field study published in PNAS[5] by Lisa L. Shu, Nina Mazar, Francesca Gino, Dan Ariely, and Max H. Bazerman.[6] [7] All of the study's authors agreed with their assessment and the paper was retracted. The authors also agreed that Ariely was the only author who had access to the data prior to transmitting it in its fraudulent form to Mazar, the analyst. Ariely denied manipulating the data,[8] but Excel metadata showed that he created the spreadsheet and was the last to edit it. He also admitted to having mislabeled all of the values in an entire column of the data in an e-mail to Mazar shortly after he initially sent her the data.[9] Ariely has stated that someone at the insurance agency that provided the data must have fabricated it.[10] [11]
Data Colada's work is credited with contributing awareness to the replication crisis, the idea that many research results in the social sciences are difficult or impossible to reproduce. Data Colada is also recognized for helping to establish better research practices, such as the sharing of replication data.
The Nobel-prize winning psychologist Daniel Kahneman described Data Colada in 2023 as "heroes of mine" and expressed his regret about previously endorsing research findings that the blog later showed were faulty. Brian Nosek of the Center for Open Science applauded Data Colada for having "done an amazing job of developing new methodologies to interrogate the credibility of research."
On the other hand, as summarized by The New Yorker, "Data Colada's harshest critics saw the young men as jealous upstarts who didn’t understand the soft artistry of the social sciences". Psychologist Norbert Schwarz accused Data Colada and other reformers of engaging in a "witch hunt," while psychologist Daniel Gilbert denounced what he called the "replication police" as "shameless little bullies".
In 2021, researcher Zoé Ziani and another collaborator alerted Data Colada about problems replicating work by Harvard behavioral scientist Francesca Gino. Later that year, the Data Colada team contacted Harvard University about anomalies in four papers by Gino. Harvard subsequently conducted its own internal investigation with the help of an outside firm, which discovered additional data alterations besides the cases raised by Data Colada. In June 2023, Harvard Business School placed Gino on unpaid administrative leave after the internal investigation determined she had falsified data in her research.[12] [13] [14] Around the same time, Data Colada published four blog posts detailing evidence that the four papers (all of which had been retracted or set to be retracted at that point), and possibly others by Gino, "contain fake data." Gino subsequently filed a defamation suit against Harvard, Harvard Business School Dean Srikant Datar, and the three members of Data Colada for $25 million, alleging that they had conspired to damage her reputation with false accusations, and that the penalties against her amounted to gender-based discrimination under Title IX.[14] Gino accused Harvard and the Data Colada team of having "worked together to destroy my career and reputation despite admitting they have no evidence proving their allegations."[15] The lawsuit raised concerns about chilling effects. Open science proponent Simine Vazire raised over $370,000 to help cover the legal fees of Data Colada.[16] [17]