Presto (SQL query engine) explained

Presto
Author:	Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang
Programming Language:	Java
Operating System:	Cross-platform
Standard:	SQL
Genre:	Data warehouse
License:	Apache License 2.0
Website:

Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata,^[1] and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

History

Presto was originally designed and developed at Facebook, Inc. (later renamed Meta) for their data analysts to run interactive queries on its large data warehouse in Apache Hadoop. The first four developers were Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang.Before Presto, the data analysts at Facebook relied on Apache Hive for running SQL analytics on their multi-petabyte data warehouse.^[2] Hive was deemed too slow for Facebook's scale and Presto was invented to fill the gap to run fast queries. Original development started in 2012 and deployed at Facebook later that year. In November 2013, Facebook announced its open source release.^[3] ^[4]

In 2014, Netflix disclosed they used Presto on 10 petabytes of data stored in the Amazon Simple Storage Service (S3).^[5] In November, 2016, Amazon announced a service called Athena that was based on Presto.^[6] In 2017, Teradata spun out a company called Starburst Data to commercially support Presto, which included staff acquired from Hadapt in 2014.^[7] Teradata's QueryGrid software allowed Presto to access a Teradata relational database.^[8]

In January 2019, the Presto Software Foundation was announced. The foundation is a not-for-profit organization for the advancement of the Presto open source distributed SQL query engine.^[9] ^[10] At the same time, Presto development forked: PrestoDB maintained by Facebook, and PrestoSQL maintained by the Presto Software Foundation, with some cross pollination of code.

In September 2019, Facebook donated PrestoDB to the Linux Foundation, establishing the Presto Foundation.^[11] Neither the creators of Presto, nor the top contributors and committers, were invited to join this foundation.^[12]

By 2020, all four of the original Presto developers had joined Starburst.^[13] In December 2020, PrestoSQL was rebranded as Trino, since Facebook had obtained a trademark on the name "Presto" (also donated to the Linux Foundation).^[14]

Another company called Ahana was announced in 2020 to commercialize the PrestoDB fork as a cloud service and was acquired by IBM in 2023.^[15]

Architecture

Presto's architecture is very similar to other database management systems using cluster computing, sometimes called massively parallel processing (MPP). One coordinator works in sync with multiple workers. Clients submit SQL statements that are parsed and planned, following which parallel tasks are scheduled to workers. Workers jointly process rows from the data sources and produce results that are returned to the client. Compared to the original Apache Hive execution model which used the Hadoop MapReduce mechanism on each query, Presto does not write intermediate results to disk, resulting in a significant speed improvement. Presto is written in Java.

A Presto query can combine data from multiple sources. Presto offers connectors to data sources including files in Alluxio, Hadoop Distributed File System (often called a data lake), Amazon S3, MySQL, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Apache Kudu, Apache Phoenix, Apache Kafka, Apache Cassandra, Apache Accumulo, MongoDB and Redis. Unlike other Hadoop distribution-specific tools, such as Apache Impala, Presto can work with any variant of Hadoop or without it. Presto supports separation of compute and storage and may be deployed on-premises or using cloud computing.

Notes and References

https://teradata.github.io/presto/docs/0.167-t/overview/teradata-distribution.html 1.1. Teradata Distribution of Presto — Teradata Distribution of Presto 0.167-t.0.2 Documentation
News: . Starburst and Presto: with Stellar Velocity . Index Ventures Blog . November 20, 2019 . January 27, 2022 .
News: Facebook goes open source with query engine for big data. Joab Jackson. November 6, 2013. Computer World. April 26, 2017.
News: Facebook unveils Presto engine for querying 250 PB data warehouse. Jordan Novet. June 6, 2013. Giga Om. April 26, 2017.
News: Using Presto in our Big Data Platform on AWS. Eva Tse . Zhenxiao Luo . Nezih Yigitbasi . October 7, 2014. Netflix technical blog. April 26, 2017.
News: Amazon Athena – Interactive SQL Queries for Data in Amazon S3 . Jeff Barr . November 30, 2016 . AWS News Blog . January 27, 2022 .
Web site: Teradata spins off Starburst . Philip Howard . December 21, 2017 . Bloor . January 26, 2022 .
News: Hey Presto! Teradata admits its vision is dead by hooking QueryGrid analytics platform up to rival data warehouses . Lindsay Clark . December 17, 2020 . The Register . January 26, 2022 .
Web site: Presto Software Foundation Launches to Advance Presto Open Source Community . Press release . January 31, 2019 . January 2, 2022 .
Web site: Presto's New Foundation Signals Growth for the Big Data SQL Engine. 2019-01-31. The New Stack. en-US. 2019-02-01.
Web site: Facebook, Uber, Twitter and Alibaba form Presto Foundation to Tackle Distributed Data Processing at Scale. 23 September 2019. 2019-11-12.
News: What's the relationship between prestosql and prestodb? . Comment on issue #38 of Trino Github . November 22, 2019 . Piotr Findeisen . January 27, 2022 .
News: Original Presto Co-Creators Reunite on the Starburst Technical Leadership Team . September 22, 2020 . Press release . January 26, 2022 .
Web site: We're rebranding PrestoSQL as Trino . Trino blog . Martin Traverso, Dain Sundstrom, David Phillips . December 27, 2020 . January 26, 2022 .
Web site: Gillin . Paul . 14 April 2023 . IBM acquires Ahana, joins the Presto Foundation . 20 April 2023 . SiliconANGLE.

Presto (SQL query engine) explained

History

Architecture

See also

Notes and References