List of Apache Software Foundation projects explained

This list of Apache Software Foundation projects contains the software development projects of The Apache Software Foundation (ASF).[1]

Besides the projects, there are a few other distinct areas of Apache:

provides and manages all infrastructure and services for the Apache Software Foundation, and for each project at the Foundation

Active projects

secure implementation of Bigtable

message broker supporting different communication protocols and clients, including a full Java Message Service (JMS) 1.1 client.[2]

PostgreSQL extension that provides graph database functionality in order to enable users of PostgreSQL to use graph query modeling in unison with PostgreSQL's’ existing relational model

a distributed system software framework to manage simple to composite applications with complex execution and workflow patterns on diverse computational resources

Python-based platform to programmatically author, schedule and monitor workflows

Python-based open source implementation of a software forge

Java-based build tool

The Ant Library provides Ant tasks for testing Ant task, it can also be used to drive functional and integration tests of arbitrary applications with Ant

a very powerful dependency manager oriented toward Java dependency management, even though it could be used to manage dependencies of any kind

integrate Ivy in Eclipse with the IvyDE plugin

cloud-native microservices API gateway

Build Artifact Repository Manager

OSGi Enterprise Programming Model

"A high-performance cross-system data layer for columnar in-memory analytics".[3] [4]

open source Big Data Management System

scalable and extensible set of core foundational governance services

a data serialization system.

open source, XML based Web service framework

a service hosting and consumption framework that makes it easy to use SOAP and Web Services

implementation of the WS-Security standard for the Axis2 Web services engine

an Axis2 module implementing WS-RM.

extensions to distributed analytic platforms such as Apache Spark

a project for the development of packaging and tests of the Apache Hadoop ecosystem.

defect tracker based on Trac[5]

a reliable replicated log service

a framework for modelling, monitoring, and managing applications through autonomic blueprints

industrial-grade RPC framework for building reliable and high-performance services

tool for building/integrating software stacks

Bean Validation API Implementation

dynamic data management framework

declarative routing and mediation rules engine which implements the Enterprise Integration Patterns using a Java-based domain specific language

an indexed columnar data format for fast analytics on big data platform, e.g., Apache Hadoop, Apache Spark, etc

highly scalable second-generation distributed database

Java ORM framework

implementation of the OSGi specification adapted to C and C++

software to deploy and manage cloud infrastructure

XML publishing framework

reusable Java libraries and utilities too small to merit their own project

Bytecode Engineering Library

Commons Daemon

Jelly is a Java and XML based scripting engine. Jelly combines the best ideas from JSTL, Velocity, DVSL, Ant and Cocoon all together in a simple yet powerful scripting engine

Commons Logging is a thin adapter allowing configurable bridging to other, well known logging systems

Object Graph Navigation Library

project that creates and provides tools, processes, and advice to help open-source software projects improve their own community health

mobile development framework

Document-oriented database

improves accuracy and efficiency when reviewing and auditing releases.

simplifies the job of reviewing repository releases consisting of large numbers of artefacts

assists assembled applications to maintain correct legal documentation.

clinical "Text Analysis Knowledge Extraction Software" to extract information from electronic medical record clinical free-text

builds on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations

web services framework

implementation of the Data Format Description Language (DFDL) used to convert between fixed format data and XML/JSON

collection of libraries for working with large-scale data in Hadoop

open source, high-performance library of stochastic streaming algorithms commonly called "sketches" in the data sciences

pure Java relational database management system

Java Data Objects, persistence for Java objects

ORM for Java

collection of JSR-299 (CDI) Extensions for building applications on the Java SE and EE platforms

LDAP and Kerberos, entirely in Java.

an extensible, embeddable LDAP and Kerberos server, entirely in Java

Eclipse based LDAP browser and directory client

a standards-based authorization platform that implements ANSI INCITS 359 Role-Based Access Control (RBAC)

Kerberos binding in Java

an SDK for directory access in Java

a distributed ETL scheduling engine with powerful DAG visualization interface

MPP-based interactive SQL data warehousing for reporting and analysis, good for both high-throughput scenarios and high-concurrency point queries

software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets

high-performance, column-oriented, distributed data store

high-performance, lightweight, Java-based RPC framework

charting and data visualization library written in JavaScript

a lightweight relational database abstraction layer and data persistence component

dynamic cloud-native basic service runtime used to decouple the application and middleware layer

implementation of the OSGi Release 5 core framework specification

Platform for Digital Financial Services

software tool usability testing platform

cross-platform SDK for developing and deploying rich Internet applications.

fast and reliable large-scale data processing engine.

large scale log aggregation framework

a distributed processing system that lets users make incremental updates to large data sets

Apache Fluo Recipes build on the Fluo API to offer additional functionality to developers

a tool for running Apache Fluo applications in Apache Hadoop YARN

a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers

low latency, high concurrency data management solutions

Java EE server

distributed data integration framework

an open source framework that provide an in-memory data model and persistence for big data

an open source Data Quality solution for Big Data, which supports both batch and streaming mode. Originally developed by eBay[6]

an object-oriented, dynamic programming language for the Java platform

HTML5 web application for accessing remote desktops [7]

integration, dependencies, and versioning management

Java software framework that supports data intensive distributed applications

advanced enterprise SQL on Hadoop analytic engine

Apache HBase software is the Hadoop database. Think of it as a distributed, scalable, big data store

a cluster management framework for partitioned and replicated distributed resources

the Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage.

The Hop Orchestration Platform, or Apache Hop, aims to facilitate all aspects of data and metadata orchestration.

The Apache HTTP Server application 'httpd'

module that integrates the Python interpreter into Apache server. Deprecated in favour of mod_wsgi.

low-level Java libraries for HTTP

provides atomic upserts and incremental data streams on Big Data

an open standard for analytic SQL tables, designed for high performance and ease of use.

an In-Memory Data Fabric providing in-memory data caching, partitioning, processing, and querying components[8]

a high-performance distributed SQL engine

a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities

data store for managing large amounts of time series data in industrial applications

implementation of the Java Content Repository API

Java email and news server

open source multi-cloud toolkit for the Java platform

pure Java application for load and functional testing

JSR-353 compliant JSON parsing; modules to help with JSR-353 as well as JSR-374 and JSR-367

A feature-rich and extensible WikiWiki engine built around the standard J2EE components (Java, servlets, JSP)

A toolkit for marshalling POJOs to a wide variety of content types using a common framework

a message broker software

an OSGi distribution for server-side applications.

a suite of tools for collecting, aggregating and visualizing activity in software projects.

a REST API Gateway for Hadoop Services

a distributed columnar storage engine built for the Apache Hadoop ecosystem

a distributed key-value NoSQL database, supporting the rich data structure

distributed analytics engine

a distributed multi-tenant Thrift JDBC/ODBC server for large-scale data management, processing, and analytics, built on top of Apache Spark and designed to support more engines

a standard Python library that abstracts away differences among multiple cloud provider APIs.

a computation middleware project, which decouples the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.)

provides logging services for C++.

Apache Log4j

provides logging services for .NET.

a logging framework for PHP.

a high-performance, full-featured text search engine library

enterprise search server based on the Lucene Java search library

a port of the Lucene search engine library, written in C# and targeted at .NET runtime users.

Scalable, Big Data, SQL-driven machine learning framework for Data Scientists

machine learning and data mining solution. Mahout

Open-source software for transferring content between repositories or search indexes

Java project management and comprehension tool

a content generation framework, which supports many markup languages.

open-source cluster manager

FTP server written entirely in Java

Multipurpose Infrastructure for Network Application, a framework to develop high performance and high scalability network applications. MINA

a 100% pure Java library to support the SSH protocols on both the client and server side SSHD

aims to be a modular, full featured XMPP (Jabber) server. Vysper is implemented in Java

a transparent nonvolatile hybrid memory oriented library for Big data, High-performance computing, and Analytics

JavaServer Faces implementation

set of user interface components based on JSF

embedded OS optimized for networking and built for remote management of constrained devices

development environment, tooling platform, and application framework

easy to use, powerful, and reliable system to process and distribute data

a highly extensible and scalable open source web crawler

mature, real-time embedded operating system (RTOS)

Open for Business: enterprise automation software

Client and Server for OData

a workflow scheduler system to manage Apache Hadoop jobs.

Java Persistence API Implementation

video conferencing, instant messaging, white board and collaborative document editing application

natural language processing toolkit

an open-source, office-document productivity suite

Dependency Injection Platform

distributed Serverless computing platform

columnar file format for big data workloads

scalable, redundant, and distributed object store for Hadoop

a general-purpose columnar storage format

Java based PDF library (reading, text extraction, manipulation, viewer)

module that integrates the Perl interpreter into Apache server

toolkit and an ecosystem for building highly concurrent, distributed, reactive and resilient applications for Java and Scala[9]

deals with the assessment of, education in, and adoption of the Foundation's policies and procedures for collaborative development and the pros and cons of joining the Foundation

SQL layer on HBase

a platform for analyzing large data sets on Hadoop

a column-oriented, open-source, distributed data store written in Java[10]

a platform for building rich internet applications in Java

Universal API for communicating with programmable logic controllers

Poor Obfuscation Implementation, a library for reading and writing Microsoft Office formats

XML–Java binding tool

Apache Portable Runtime, a portability library written in C

web portal related software

distributed pub-sub messaging system originally created at Yahoo

AMQP messaging system in Java and C++

a framework to enable, monitor and manage comprehensive data security across the Hadoop platform

Java implementation for RAFT consensus protocol

a fast, low latency, reliable, scalable, distributed, easy to use message-oriented middleware, especially for processing large amounts of streaming data

a full-featured, multi-user and group blog server suitable for both small and large blog sites

improving developer productivity in creating applications for wherever JavaScript runs (and other runtimes)

cloud-based RDF triple store that supports SPARQL queries

Stream Processing Framework

XML Security in Java and C++

integrated data analytic center for Big Science problems

a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data

big geospatial data processing engine

high performance C-based HTTP client library built upon the Apache Portable Runtime (APR) library

microservice framework that provides a set of tools and components to make development and deployment of cloud applications easier

related to a database clustering system providing data sharding, distributed transactions, and distributed database management

Java native API Gateway for service proxy, protocol conversion and API governance

a simple to use Java Security Framework

a distributed deep learning library

A library for developing geospatial applications

application performance management and monitoring (APM)

innovative Web framework based on JCR and OSGi

Full Text search server

email filter used to identify spam

open source cluster computing framework

STeVe is a collection of online voting tools, used by the ASF, to handle STV and other voting methods

a distributed real-time computation system.

self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore (Industrial) IoT data streams

Interoperability of online profiles and activity feeds

Java web applications framework

Cloud Native Machine Learning Platform

open source version control (client/server) system

enterprise-ready web application for data exploration, data visualization and dashboarding

a lightweight and high-performance Enterprise Service Bus (ESB)

an Open Source system for managing digital identities in enterprise environments.

scalable machine learning

component-based Java web framework

Server-side Tcl programming system combining ease of use and power

Websh is a rapid development environment for building powerful, fast, and reliable web applications in Tcl

an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a re-usable set of data-processing primitives which can be used by other projects

content analysis toolkit for extracting metadata and text from digital documents of various types, e.g., audio, video, image, office suite, web, mail, and binary

A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)

web container for serving servlets and JSP

A framework for creating JSP taglibs that aid in rapid development of voice and multimodal applications

an all-Apache Java EE 6 Web Profile stack for Apache Tomcat

Built around Apache Traffic Server as the caching software, Traffic Control implements all the core functions of a modern CDN. Traffic Control

HTTP/1.1 compliant caching proxy server. Traffic Server

a servlet based framework that allows Java developers to quickly build web applications

an end to end machine learning compiler framework for CPUs, GPUs and accelerators

unstructured content analytics framework

reference implementation of the OASIS customer data platform specification

a cloud computing platform for provisioning and brokering access to dedicated remote compute resources.

an XML transformation tool which uses JDOM and Velocity to transform XML documents into multiple formats.

a general purpose text generating utility based on Apache Velocity and Apache Ant.

Java template creation engine

a tool modeled after XSLT and intended for general XML transformations using the Velocity Template Language.

tools and infrastructure for the template engine

an XML object model supporting deferred parsing.

used to develop a Java class library for reading, manipulating, creating and writing WSDL documents.

tools that display and visualize various bits of data related to ASF organizations and processes.

component-based Java web framework

XSLT processors in Java and C++

validating XML parser

pure Java library for SVG content manipulation

Java print formatter driven by XSL formatting objects (XSL-FO); supported output formats include PDF, PS, PCL, AFP, XML (area tree representation), Print, AWT and PNG, and to a lesser extent, RTF and TXT

common components for Apache Batik and Apache FOP

standalone resource scheduler responsible for scheduling batch jobs and long-running services on large scale distributed systems

a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems

coordination service for distributed applications

Incubating projects

provides annotation enabling code for browsers, servers, and humans

toolkit and a set of infrastructure components for creating, publishing and operating online maps

intermediate data service for big data computing engines to boost performance, stability and flexibility

platform for creating self-service, exploratory data science environments in the cloud using best-of-breed data science tools

development data platform, providing the data infrastructure for developer teams to analyze and improve their engineering productivity

a large-scale and easy-to-use graph database

community of solutions and supporting tooling for knowledge engineering and process automation, focusing on events, rules and workflows

an end-to-end platform for data engineers and scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

web service that exposes a REST interface for managing long-running Spark contexts

core security infrastructure for decentralized networks

data processing system

Java API for NLU applications

Open Data Access Layer. Offers native layer support, enabling users to implement middleware or intercept for all operations

unified lake storage to build dynamic tables for both stream and batch processing with big data compute engines, supporting high-speed data ingestion and real-time data query

distributed key-value storage system which is designed to be simple, horizontally scalable, strongly consistent and high-performance

mail-archiving, archive viewing, and interaction service

a streaming application development platform

universal secure computing platform

provides applications with a mechanism to interactively and remotely access Spark

project aims to develop resources which can be used for training purposes in various media formats, languages and for various Apache and non-Apache target projects

set of libraries and other tools to aid development of blockchain and other decentralized software in Java and other JVM languages

an unified Remote Shuffle Service

cross-platform data processing system

The above may be incomplete, as the list of incubating project changes frequently.

Retired projects

A retired project is one which has been closed down on the initiative of the board, the project its PMC, the PPMC or the IPMC for various reasons. It is no longer developed at the Apache Software Foundation and does not have any other duties.

implementation of the Atom Syndication Format and Atom Publishing Protocol

a distribution framework that allows central management and distribution of software components, configuration data and other artefacts to target systems

Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents

Enterprise-grade unified stream and batch processing engine

Mesos framework for long-running services and cron jobs

XML Application Server for Apache. It provided on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code

Java visual object model

provides open source implementations of the Content Management Interoperability Services (CMIS) specification

Chukwa is an open source data collection system for monitoring large distributed systems

a service platform which provides a set of functionality for management of semantically linked data accessible through RESTful Web Services and in a secured way

simple and easy-to-use Java Web Framework

continuous integration server

Java XML parser which supports XML 1.0 via various APIs

Provides a framework for writing, testing, and running MapReduce pipelines

provides common front-end APIs to abstract differences between cloud providers

device Data Repository and classification API

off-heap cache for the Java Virtual Machine

large scale code license analysis, auditing and reporting

open source analytics solution for identifying security and performance issues instantly on big data platforms

API for generating elements for various markup languages

secure and highly scalable microsharing and micromessaging platform that allows people to discover and meet one another and get controlled access to other sources of information, all in a business process context

cross-platform, language- and transport-independent RPC-like messaging framework

Java inversion of control framework including containers and components

data governance engine

documentation framework based upon Cocoon

scalable Graph Processing System

Hama is an efficient and scalable general-purpose BSP computing engine

Java SE 5 and 6 runtime and development kit

services and configuration microkernel

Persistence framework which enables mapping SQL queries to POJOs

server side Java, including its own set of subprojects

simple test framework for unit testing server-side Java code

statistical machine translation toolkit

Apache Scout is an implementation of the JSR 93 (JAXR).

a place for innovation where committees of the foundation can experiment with new ideas

Unified Analytics Interface

search engine library that provides full-text search for dynamic programming languages

An Open Platform for Linked Data

provides a common interface for discovery, exploration of metadata and querying of different types of data sources.

Real-time big data security

Java library that helps developers unit test Apache Hadoop map reduce jobs

Deep learning programming framework

Apache ODE is a WS-BPEL implementation that supports web services orchestration using flexible process definitions.

Object/Relational mapping tool that allowed transparent persistence for Java Objects against relational databases

OAuth protocol implementation in Java

project focused on the development and maintenance of a set of Google Guice extensions not provided out of the box by the library itself

Object Oriented Data Technology, a data management framework for capturing and sharing data

A comprehensive suite of algorithms, libraries, and interfaces designed to standardize and streamline the process of interacting with large quantities of observational data and conducting regional climate model evaluations

Regular Expression engine supporting various dialects

community based effort exploring Composite Oriented Programming for domain centric application development

PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks.

A scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos

Regular Expression engine

provides a standards-compliant JINI service

Fine grained authorization to data and metadata in Apache Hadoop

web application framework based on JavaServer Faces

OpenSocial container; helps start hosting OpenSocial apps quickly by providing the code to render gadgets, proxy requests, and handle REST and RPC requests

a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases

collection of algorithms, containers, iterators, and other fundamental components of every piece of software, implemented as C++ classes, templates, and functions essential for writing C++ programs

Software components for semantic content management

Platform-as-a-Service (PaaS) framework

relational data warehousing system. It using the hadoop file system as distributed storage.

templating framework built to simplify the development of web application user interfaces.

SCA implementation, also providing other SOA implementations

Use Apache Hadoop YARN's distributed capabilities with a programming model that is similar to running threads

an open-source Backend-as-a-Service ("BaaS" or "mBaaS") composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications

online real-time collaborative editing

set of libraries for running cloud services

RESTFul web services based on JAX-RS specification

parser, server and plugins for working with W3C Packaged Web Apps

implementation of the WS-ResourceFramework (WSRF), WS-BaseNotification (WSN), and WS-DistributedManagement (WSDM) specifications

XML Web Framework that aggregated multiple data sources, made that data URL addressable and defined custom methods to access that data

XML Database

distributed tracing system

ollection of Java libraries, frameworks and tools around the CMIS specification for document interoperability.

The above may be incomplete, as the list of retired projects changes.

External list

Notes and References

  1. Web site: Apache Project List. The Apache Software Foundation. 2018. 2018-05-19.
  2. Web site: Project information, Apache ActiveMQ. Apache.
  3. Web site: Apache Arrow. Apache Software Foundation. 12 May 2016.
  4. Web site: The Apache Software Foundation Announces Apache Arrow as a Top-Level Project. 17 February 2016 . Apache Software Foundation. 12 May 2016.
  5. Web site: Bloodhound Project Incubation Status. Apache Software Foundation. 21 March 2013.
  6. Web site: Griffin — Model-driven Data Quality Service on the Cloud for Both Real-time and Batch Data. 2016-10-12. 2020-10-21. Alex Lv.
  7. Web site: Apache Guacamole™. guacamole.apache.org. 2019-10-02.
  8. Web site: Project information, Apache Ignite. Apache.
  9. Web site: Apache Software Foundation Announces New Top-Level Project Apache® Pekko™. 2024-05-16. Apache Software Foundation.
  10. Web site: The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project . blogs.apache.org. 2 August 2021 .
  11. Web site: HP Throws Trafodion Hat into OLTP Hadoop Ring . Woodie . Alex . 14 July 2014 . datanami.
  12. Book: Pal, Sumit . SQL on Big Data . Why SQL on big data? . 11 . 18 November 2016 . Apress . 978-1484222461.
  13. Web site: The Apache Software Foundation Announces Apache Trafodion as a top-level project . 10 January 2018 . Sally . Apache Foundation.