GDELT Project explained

The GDELT Project, or Global Database of Events, Language, and Tone, created by Kalev Leetaru of Yahoo! and Georgetown University, along with Philip Schrodt and others, describes itself as "an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day."[1] [2] [3] Early explorations leading up to the creation of GDELT were described by co-creator Philip Schrodt in a conference paper in January 2011.[4] The dataset is available on Google Cloud Platform.

Data

GDELT includes data from 1979 to the present. The data is available as zip files in tab-separated value format using a CSV extension for easy import into Microsoft Excel or similar spreadsheet software.[5] Data from 1979 to 2005 is available in the form of one zip file per year, with the file size gradually increased from 14.3 MB in 1979 to 125.9 MB in 2005, reflecting the increase in the number of news media and the frequency and comprehensiveness of event recording.[6] Data files from January 2006 to March 2013 are available at monthly granularity, with the zipped file size rising from 11 MB in January 2006 to 103.2 MB in March 2013. Data files from April 1, 2013 onward are available at a daily granularity. The data file for each date is made available by 6 AM Eastern Standard Time the next day. As of June 2014, the size of the daily zipped file is about 5-12 MB.[5] [6] The data files use Conflict and Mediation Event Observations (CAMEO) coding for recording events.[7]

In a blog post for Foreign Policy, co-creator Kalev Leetaru attempted to use GDELT data to answer the question of whether the Arab Spring sparked protests worldwide, using the quotient of the number of protest-related events to the total number of events recorded as a measure of protest intensity for which the time trend was then studied. Political scientist and data science/forecasting expert Jay Ulfelder critiqued the post on his personal blog, saying that Leetaru's normalization method may not have adequately accounted for the change in the nature and composition of media coverage.

The dataset is also available on Google Cloud Platform and can be accessed using Google BigQuery.

Reception

Academic reception

GDELT has been cited and used in a number of academic studies, such as a study of visual and predictive analytics of Singapore news (along with Wikipedia and the Straits Times Index)[8] and a study of political conflict.[9]

The challenge problem at the 2014 International Social Computing, Behavioral Modeling and Prediction Conference (SBP) asked participants to explore GDELT and apply it to the analysis of social networks, behavior, and prediction.[10]

Reception in blogs and media

GDELT has been covered on the website of the Center for Data Innovation[11] as well as the GIS Lounge.[12] It has also been discussed and critiqued on blogs about political violence and crisis prediction.[13] [14] [15] The dataset has been cited and critiqued repeatedly in Foreign Policy,[2] [16] including in discussions of political events in Syria,[17] the Arab Spring,[18] [19] and Nigeria.[20] It has also been cited in New Scientist,[21] on the FiveThirtyEight website[22] and Andrew Sullivan's blog.[23]

The Predictive Heuristics blog and other blogs have compared GDELT with the Integrated Conflict Early Warning System (ICEWS).[24] [25] Alex Hanna blogged about her experiment assessing GDELT with hand-coded data by comparing it with the Dynamics of Collective Action dataset.[26]

In May 2014, the Google Cloud Platform blog announced that the entire GDELT dataset would be available as a public dataset in Google BigQuery.[27]

See also

Notes and References

  1. Web site: About GDELT: The Global Database of Events, Language, and Tone. June 2, 2014.
  2. Mapped: Every Protest on the Planet Since 1979. Foreign Policy. June 2, 2014.
  3. Web site: Global Database of Events, Language, and Tone . datahub.io. June 2, 2014.
  4. Web site: Automated Production of High-Volume, Near-Real-Time Political Event Data. Schrodt. Philip. January 20, 2011. June 12, 2014 . dead . https://web.archive.org/web/20170702114039/http://www.princeton.edu/~pcglobal/conferences/methods/papers/schrodt.pdf . 2017-07-02.
  5. Web site: Raw data files. Global Database of Events, Language, and Tone.
  6. Web site: All GDELT Event Files. June 12, 2014.
  7. Web site: Documentation. Global Database of Events, Language, and Tone.
  8. 1404.1996. Visual and Predictive Analytics on Singapore News: Experiments on GDELT, Wikipedia, and ^STI. Phua. Clifton. Feng. Yuzhang. Ji. Junyao. Soh. Timothy. 2014 . cs.OH .
  9. Web site: A nuanced study of political conflict using the Global Datasets of Events Location and Tone (GDELT) dataset. Yonamine. James E.. June 2, 2014.
  10. Web site: SBP 2014 Grand Challenge: explore GDELT, Global Database of Events, Language and Tone. June 2, 2014.
  11. Web site: Creating a Real-Time Global Database of Events, People, and Places in the News. December 15, 2013. June 2, 2014. Center for Data Innovation.
  12. Web site: Mapping Global Events Since 1979. Caitlin Dempsey Morais. September 5, 2013. June 2, 2014. GIS Lounge.
  13. Web site: Another Note on the Limitations of Event Data. Ulfelder. Jay. June 6, 2014. June 12, 2014.
  14. Web site: Raining on the Parade: Some Cautions Regarding the Global Database of Events, Language and Tone Dataset. February 20, 2014. June 2, 2014. Political Violence at a Glance.
  15. Web site: Global Database of Events, Language, and Tone (GDELT) — (Old) Big Data to See (New) Crises?. Jongman. Berto. January 5, 2014. June 2, 2014. Public Intelligence Blog.
  16. What can we learn from the last 200 million things that happened in the world?. Keating. Joshua. Joshua Keating. April 10, 2013. June 2, 2014. Foreign Policy. dead. https://web.archive.org/web/20140606210801/http://ideas.foreignpolicy.com/posts/2013/04/10/what_can_we_learn_from_the_last_200_million_things_that_happened_in_the_world. June 6, 2014.
  17. How Well Does GDELT Follow Events in Syria?. Keating. Joshua. July 9, 2013. June 2, 2014. Foreign Policy. https://web.archive.org/web/20140606234253/http://ideas.foreignpolicy.com/posts/2013/07/09/how_well_does_gdelt_follow_events_in_syria. June 6, 2014. dead.
  18. Did the Arab Spring Really Spark a Wave of Global Protests? The world may look like it's roiling now, but the 1980s were far worse.. Foreign Policy. Leetaru. Kalev. May 29, 2014. June 2, 2014.
  19. Web site: The Arab Spring and GDELT. Steinert-Threlkeld. Zachary. September 27, 2013. June 18, 2014.
  20. Mapping Violence and Protests in Nigeria: How Big Data can find the big story.. Leetaru. Kalev. Foreign Policy. March 13, 2014. June 2, 2014.
  21. World's largest events database could predict conflict. Heaven. Douglas. New Scientist. May 13, 2013. June 2, 2014.
  22. News: Kidnapping of Girls in Nigeria Is Part of a Worsening Problem (Updated). Chalabi. Mona. May 6, 2014. June 2, 2014. FiveThirtyEight.
  23. Web site: Not Your Father's Global Uprising. Sullivan. Andrew. May 30, 2014. June 2, 2014.
  24. Web site: GDELT and ICEWS, a short comparison. mdwardlab. Predictive Heuristics. October 17, 2013. June 18, 2014. https://web.archive.org/web/20140717082617/http://predictiveheuristics.com/2013/10/17/gdelt-and-icews-a-short-comparison/. July 17, 2014. dead.
  25. Web site: Noise in GDELT. Beieler. John. October 28, 2013. June 21, 2014.
  26. Web site: Assessing GDELT with handcoded protest data. Hanna. Alex. February 24, 2014. June 21, 2014. Bad Hessian.
  27. Web site: World's largest event dataset now publicly available in BigQuery. May 29, 2014. June 2, 2014. Google Cloud Platform.