Apache Iceberg Explained

Apache Iceberg
Author:	Ryan Blue, Daniel Weeks
Programming Language:	Java, Python
Operating System:	Cross-platform
Genre:	Data warehouse, Data lake
License:	Apache License 2.0
Website:

Apache Iceberg is an open-source high-performance format for huge analytic tables. Iceberg enables the use of SQL tables for big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, Impala, StarRocks, Doris, and Pig to safely work with the same tables, at the same time.^[1] Iceberg is released under the Apache License.^[2] Iceberg addresses the performance and usability challenges of using Apache Hive tables in large and demanding data lake environments.^[3] Vendors currently supporting Apache Iceberg tables in their products include Buster,^[4] CelerData, Cloudera, Crunchy Data,^[5] Dremio, IOMETE, Snowflake, Starburst, Tabular,^[6], AWS.^[7] and Google Cloud.^[8]

History

Iceberg was started at Netflix by Ryan Blue and Dan Weeks. Hive was used by many different services and engines in the Netflix infrastructure. Hive was never able to guarantee correctness and did not provide stable atomic transactions.^[3] Many at Netflix avoided using these services and making changes to the data to avert unintended consequences from the Hive format.^[3] Ryan Blue set out to address three issues that faced the Hive table by creating Iceberg:^[3] ^[9]

Ensure the correctness of the data and support ACID transactions.
Improve performance by enabling finer-grained operations to be done at the file granularity for optimal writes.
Simplify and abstract general operation and maintenance of tables.

Iceberg development started in 2017.^[10] The project was open-sourced and donated to the Apache Software Foundation in November 2018.^[11] In May 2020, the Iceberg project graduated to become a top-level Apache project.^[11]

Iceberg is used by multiple companies including Airbnb,^[12] Apple,^[3] Expedia,^[13] LinkedIn,^[14] Adobe,^[15] Lyft, and many more.^[16]

Notes and References

Web site: Apache Iceberg. iceberg.apache.org. 5 October 2022.
Web site: apache/iceberg GitHub License. The Apache Software Foundation. 5 October 2022. 5 October 2022.
Web site: Woodie. Alex. Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?. Datanami. 8 February 2021. 5 October 2022. 4 September 2024. https://web.archive.org/web/20240904174820/https://www.datanami.com/2021/02/08/apache-iceberg-the-hub-of-an-emerging-data-service-ecosystem/. live.
Web site: Buster. 2024-09-09. 2024-09-09. https://web.archive.org/web/20240909205255/https://www.buster.so/. live.
Web site: Woodie. Alex. Crunchy Data Goes All-in With Postgres. The Big Data Wire. en. 24 July 2024. 9 November 2024. 13 September 2024. https://web.archive.org/web/20240913235502/https://www.datanami.com/2024/07/24/crunchy-data-goes-all-in-with-postgres/. live.
Web site: Vendors. 2023-05-05. iceberg.apache.org.
Web site: Using Apache Iceberg tables – Amazon Athena. Amazon Web Services, Inc.. 2023-06-16. 2024-09-04. https://web.archive.org/web/20240904174822/https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html. live.
Web site: Google Cloud BigQuery tables for Apache Iceberg. Google Cloud, Inc.. 2024-11-21. live. 2024-11-22. https://web.archive.org/web/20241122040231/https://cloud.google.com/bigquery/docs/iceberg-tables.
Web site: Iceberg at Netflix and Beyond with Ryan Blue, EPISODE 1654 Transcript. Software Engineering Daily. 7 March 2024. 10 November 2024. 10 November 2024. https://web.archive.org/web/20241110140642/https://softwareengineeringdaily.com/wp-content/uploads/2024/02/SED1654-SED1654_Apache_Iceberg.txt. live.
Web site: Initial public release in apache/iceberg. GitHub. 5 October 2022. en. 4 September 2024. https://web.archive.org/web/20240904174821/https://github.com/apache/iceberg/commit/a5eb3f6ba171ecfc517a4f09ae9654e7d8ae0291. live.
Web site: Incubation Status Template - Apache Incubator. incubator.apache.org. 2022-10-05. 2022-10-05. https://web.archive.org/web/20221005212358/https://incubator.apache.org/projects/iceberg.html. live.
Web site: Zhu. Ronnie. Upgrading Data Warehouse Infrastructure at Airbnb. The Airbnb Tech Blog. en. 26 September 2022.
Web site: Mathiesen. Christine. A Short Introduction to Apache Iceberg. Expedia Group Technology. 5 October 2022. en. 26 January 2021. 5 October 2022. https://web.archive.org/web/20221005212358/https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799. live.
Web site: FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format. engineering.linkedin.com. en. 2022-10-05. 2024-09-04. https://web.archive.org/web/20240904174822/https://www.linkedin.com/blog/engineering/open-source/fastingest-low-latency-gobblin. live.
Web site: Bremner. Jaemi. Iceberg at Adobe. Medium. en. 3 December 2020. 5 October 2022. 4 September 2024. https://web.archive.org/web/20240904174823/https://blog.developer.adobe.com/iceberg-at-adobe-88cf1950e866?gi=11bf85b84d1e. live.
Web site: Council. Data. Open Source Highlight: Apache Iceberg. www.datacouncil.ai. 5 October 2022. en-ie. 5 October 2022. https://web.archive.org/web/20221005212403/https://www.datacouncil.ai/blog/apache-iceberg. live.

Apache Iceberg Explained

History

See also

Notes and References