OpenSAF explained

OpenSAF
OpenSAF
Logo Size:84px
Author:Motorola
Developer:OpenSAF Foundation
Latest Release Version:5.21.03
Programming Language:C++
Genre:Cluster management software

OpenSAF (commonly styled SAF, the Service Availability Framework[1]) is an open-source service-orchestration system for automating computer application deployment, scaling, and management. OpenSAF is consistent with, and expands upon, Service Availability Forum (SAF) and SCOPE Alliance standards.[2]

It was originally designed by Motorola ECC, and is maintained by the OpenSAF Project.[3] OpenSAF is the most complete implementation of the SAF AIS specifications, providing a platform for automating deployment, scaling, and operations of application services across clusters of hosts.[4] It works across a range of virtualization tools and runs services in a cluster, often integrating with JVM, Vagrant, and/or Docker runtimes. OpenSAF originally interfaced with standard C Application Programming interfaces (APIs), but has added Java and Python bindings.[2]

OpenSAF is focused on Service Availability beyond High Availability (HA) requirements. While little formal research is published to improve high availability and fault tolerance techniques for containers and cloud,[5] research groups are actively exploring these challenges with OpenSAF.

History

OpenSAF was founded by an Industry consortium, including Ericsson, HP, and Nokia Siemens Networks, and first announced by Motorola ECC, acquired by Emerson Network Power, on February 28, 2007.[6] The OpenSAF Foundation was officially launched on January 22, 2008. Membership evolved to include Emerson Network Power, SUN Microsystems, ENEA, Wind River, Huawei, IP Infusion, Tail-f, Aricent, GoAhead Software, and Rancore Technologies.[2] [7] GoAhead Software joined OpenSAF in 2010 before being acquired by Oracle.[8] OpenSAF's development and design are heavily influenced by Mission critical system requirements, including Carrier Grade Linux, SAF, ATCA and Hardware Platform Interface. OpenSAF was a milestone in accelerating adoption of Linux in Telecommunications and embedded systems.[9]

The goal of the Foundation was to accelerate the adoption of OpenSAF in commercial products. The OpenSAF community held conferences between 2008-2010; the first conference hosted by Nokia Siemens Networks in Munich (Germany), second hosted by Huawei in Shenzhen (China), and third hosted by HP in Palo Alto (USA). In February 2010, the first commercial deployment of OpenSAF in carrier networks was announced.[10] Academic and industry groups have independently published books describing OpenSAF-based solutions.[2] [11] A growing body of research in service availability is accelerating the development of OpenSAF features supporting mission-critical cloud and microservices deployments, and service orchestration.[12] [13]

OpenSAF 1.0 was released January 22, 2008. It comprised the NetPlane Core Service (NCS) codebase contributed by Motorola ECC.[14] Along with the OpenSAF 1.0 release, the OpenSAF foundation was incepted.[6] OpenSAF 2.0 released on August 12, 2008, was the first release developed by the OpenSAF community. This release included Log service and 64-bit support.[14] OpenSAF 3.0 released on June 17, 2009, included platform management, usability improvements, and Java API support.[15]

OpenSAF 4.0 was a milestone release in July 2010.[2] Nicknamed the "Architecture release", it introduced significant changes including closing functional gaps, settling internal architecture, enabling in-service upgrade, clarify APIs, and improve modularity.[16] Receiving significant interest from industry and academics, OpenSAF held two community conferences in 2011, one hosted by MIT University in Boston MA, and a second hosted by Ericsson in Stockholm.

Release history
VersionRelease dateNotes
22 January 2008Original codebase of NetPlane Core Service (NCS) codebase contributed by Motorola ECC to OpenSAF project.
12 August 2008
17 June 2009The second release (counting from v2.0 onwards), took about 1.5 years, with contributions from Wind River Systems.[17]
1 July 2010The "Architecture" release. First viable carrier-grade deployment candidate.[18]
16 March 2012Improved manageability, enhanced availability modelling.
5 May 2016A significant release. Support for spare system controllers (2N + spares), headless cluster(cloud resilience), enhanced Python bindings, node name logging.[19]
1 June 2021

Concepts

OpenSAF defines a set of building blocks, collectively providing a mechanism to manage Service Availability (SA) of applications based on resource-capability models.[20] SA and High Availability (HA) is the probability of a service being available at a random point in time; mission-critical systems require at least 99.999% (five nines) availability. HA and SA are essentially the same, but SA goes further (i.e. in-service upgrades of hardware and software).[21] OpenSAF is designed for loosely coupled systems with fast interconnections between nodes (i.e. using TIPC/TCP),[22] and extensible to meet different workloads; components communicate between themselves using any protocol. This extensibility is provided in large part by the IMM API, used by internal components and core services. The platform can exert control over compute and storage resources by defining as Objects, to be managed as (component service) instances and/or node constraints.[2] [20] [23]

OpenSAF software is distributed in nature, following the primary/replica architecture. In an `OpenSAF' cluster, there are twotypes of nodes which can be divided into those that manage an individual node and control plane. One system controller runs in "active" mode, another in "standby" mode, and remaining system controllers (if any) are spares ready to take over as Active or Standby role in case of a fault. Nodes can run headless, without control plane, adding cloud resilience.[16] [24]

System Model

The OpenSAF System Model is the key enabler API, allowing OpenSAF to process and validate requests, and update the state of objects in the AMF model, allowing directors to schedule workloads and service groups across worker/payload nodes. AMF behavior is changed via a configuration object.[24] Services can use ‘No Redundancy’, 2N, N+M, N-way, and N-way Active redundancy models.[20] OpenSAF lacks obvious modeling toolchains to simplify design and generation of AMF configuration Models. Ongoing research to address this gap,[25] [26] needs to deliver ecosystem tools, to better support modeling and automation of carrier-grade and Cloud Native Computing Foundation use cases.

Control Plane

The OpenSAF System Controller (SC) is the main controlling unit of the cluster, managing its workload and directing communication across the system. The OpenSAF control plane consists of various components, each its own process, that can run both on a single SC node or on multiple SC nodes, supporting high-availability clusters and service availability.[2] [24] The various components of the OpenSAF control plane are as follows:

Component

The Component is a logical entity of the AMF system model and represents a normalized view of a computing resource such as processes, drivers, or storage. Components are grouped into logical Service Units (SU), according to fault inter-dependencies, and associated with a Node. The SU is an instantiable unit of workload controlled by an AMF redundancy model, either active, standby, or failed state. SU of the same type is grouped into Service Groups (SG) which exhibit particular redundancy modeling characteristics. SU within an SG gets assigned to Service Instances (SI) and given an Availability state of active or standby. SI's are scalable redundant logical services protected by AMF.[2] [16]

Node

A Node is a compute instance (a blade, hypervisor, or VM) where service instances (workload) are deployed. The set of nodes belonging to the same communication subnet (no routing) comprise the logical Cluster. Every node in the cluster must run an execution environment for services, as well as OpenSAF services listed below:

Service Unit

The basic scheduling unit in OpenSAF is a Service Unit (SU). A SU is a grouping of components. A SU consists of one or more components that are guaranteed to be co-located on the same node. SUs are not assigned IP addresses by default but may contain some component that does. A SU can be administratively managed using an object address. AmfND monitors the state of SUs, and if not in the desired state, re-deploys to the same node if possible. AmfD can start the SU on another Node if required by the redundancy model.[2] A SU can define a volume, such as a local disk directory or a network disk, and expose it to the Components in the SU.[39] SU can be administratively managed through the AMF CLI, or management can be delegated to AMF. Such volumes are also the basis for Persistent Storage.[2] [16]

Service Group

The purpose of a Service Group is to maintain a stable set of replica SU's running at any given time. It can be used to guarantee the availability of a specified number of identical SU's based on selected configured redundancy model: N-Way, N-way-Active, 2N, N+M, or 'No-redundancy'. The SG is a grouping mechanism that lets OpenSAF maintain the number of instances declared for a given SG. The definition of an SG identifies all associated SU and their state (active, standby, failed).[2] [16]

Service Instance

An OpenSAF Service Instance (SI) is a set of SU that work together, such as one tier of a multi-tier application. The set of SU that protects a service is defined by the SG. Multi-instance SG (N-way-active, N-way, N+M) requires a stable IP address, DNS name, and load balancer to distribute the traffic of that IP address among active SU in that SG (even if failures cause the SU's to move from machine to machine). By default, a service is exposed inside a cluster (e.g. SU[TypeA] is grouped into one SG, with requests from the SU[typeB] load-balanced among them), but service can also be exposed outside a cluster (e.g., for clients to reach front-end SUs).[2] [16]

Volumes

Filesystems available to OpenSAF SU's are potentially ephemeral storage, by default. If the node is destroyed/recreated the data is lost on that Node. One solution is a Network File System (NFS) shared storage, accessible to all payload nodes.[30] Other technical solutions are possible - what is important is that Volumes (File Share, mount point) can be modeled in AMF. Highly available Volumes provide persistent storage that exists for the lifetime of the SU itself. This storage can also be used as a shared disk space for SU within the SG. Volumes mounted at specific mount points on the Node are owned by a specific SG, so that instance cannot be shared with other SG using the same file system mount point.

Architecture

The OpenSAF architecture is distributed and runs in a cluster of logical nodes. All of the OpenSAF services either have 3-Tier or 2-Tier architecture. In the 3-Tiered architecture, OpenSAF services are partitioned into a service Director, a service Node-Director and an Agent. The Director is part of an OpenSAF service with central service intelligence. Typically it is a process on the controller node. The Node Directors co-ordinate node scoped service activities such as messaging with its central Director and its local Agents. The Agent provides service capabilities available to clients by way of a (shared) linkable library that exposes well-defined service APIs to application processes. Agents typically talk to their service Node Directors or Servers. The OpenSAF services are modularly classified as below[22]

The optional services can be enabled or disabled during the build/packaging of OpenSAF. OpenSAF can be configured to use TCP or TIPC as the underlying transport. Nodes can be dynamically added/deleted to/from the OpenSAF cluster at run time. OpenSAF cluster scales well up several hundred nodes. OpenSAF supports the following language bindings for the AIS interface APIs:

The modular architecture enables the addition of new services as well as the adaptation of the existing services. All OpenSAF services are designed to support in-service upgrades.

Services

The following SA Forum's AIS services are implemented by OpenSAF 5.0.[23]

Supporters

Network Equipment Providers will be the primary users of products based on the OpenSAF code base, integrating them into their products for network service providers, carriers, and operators. Many network equipment providers have demonstrated their support for OpenSAF by joining the Foundation and/or contributing to the Open Source project. Current Foundation Members include: Ericsson, HP, and Oracle. Several providers of computing and communications technology also have indicated support for the OpenSAF initiative including, OpenClovis SAFplus, Emerson Network Power Embedded Computing, Continuous Computing, Wind River, IP Infusion, Tail-f, Aricent, Rancore Technologies, GoAhead Software, and MontaVista Software.

Uses

OpenSAF is commonly used as a way to achieve carrier-grade (five-nines) service availability. OpenSAF is functionally complete but lacks the ecosystem of modeling tools available to other open-source solutions like Kubernetes and Docker Swarm.

See also

References

  1. Web site: OpenSAF/About. SourceForge. en. 2020-12-28. live. https://web.archive.org/web/20150511195123/http://sourceforge.net/p/opensaf/wiki/About%20OpenSAF/. 2015-05-11.
  2. Book: Maria Toeroe. Francis Tam. Service Availability: Principles and Practice. 2012. John Wiley & Sons. 978-1-1199-4167-5.
  3. Web site: OpenSAF Readme. SourceForge. en. 2020-12-28. live. https://web.archive.org/web/20201228180611/https://sourceforge.net/p/opensaf/code/ci/develop/tree/README. 2020-12-28.
  4. Web site: OpenSAF. OpenSAF. 19 March 2014 . 2020-12-28.
  5. Web site: Fault-Tolerant Containers Using NiLiCon. ucla. en. 2020-12-28. live. https://web.archive.org/web/20201229195151/http://web.cs.ucla.edu/~tamir/papers/ipdps20.pdf. 2020-12-29.
  6. Web site: OpenSAF project. Carolyn Mathas. eetimes. en. 2020-12-28. live. https://web.archive.org/web/20200827015804/https://www.eetimes.com/opensaf-project/. 2020-08-27.
  7. Web site: Industry Leaders To Establish Consortium On OpenSAF Project. ED News Staff. 2007. live. https://web.archive.org/web/20201229203230/https://www.electronicdesign.com/news/article/21753060/industry-leaders-to-establish-consortium-on-opensaf-project. 2020-12-29.
  8. GoAhead Software Joins OpenSAF(TM). OpenSaf Foundation. 2010. live. https://web.archive.org/web/20201229231808/https://www.prnewswire.com/news-releases/goahead-software-joins-opensaftm-89084862.html. 2020-12-29.
  9. Web site: Motorola launches open-source High Availability Operating Environment. cook. 2007. live. https://web.archive.org/web/20141221051918/https://lwn.net/Articles/224074/. 2014-12-21.
  10. OpenSAF in Commercial Deployment. OpenSAF Foundation. 2010. live. https://web.archive.org/web/20180625084624/https://www.prnewswire.com/news-releases/opensaf-in-commercial-deployment-83855077.html. 2018-06-25.
  11. Book: Software Defined Mobile Networks (SDMN): Beyond LTE Network Architecture. Madhusanka Liyanage. Andrei Gurtov. Mika Ylianttila. 2015. 9781118900253. John Wiley & Sons, Ltd.. 10.1002/9781118900253.
  12. Availability-Aware Container Scheduler for Application Services in Cloud. Yanal Alahmad. Tariq Daradkeh. Anjali Agarwal. IEEE. 2018. 1–6. 10.1109/PCCC.2018.8711295. 978-1-5386-6808-5. 155108018.
  13. Kubernetes as an Availability Manager for Microservice Applications. 2019. Leila Abdollahi Vayghan. Mohamed Aymen Saied. Maria Toeroe. Ferhat Khendek. Journal of Network and Computer Applications. 1901.04946.
  14. Web site: OpenSAF Releases 2.0. LightReading. 29 December 2020. live. https://web.archive.org/web/20200815085312/https://www.lightreading.com/atca/opensaf-releases-20/d/d-id/660133. 15 August 2020.
  15. Web site: Open source Carrier Grade Linux middleware rev'd (LinuxDevices). LWN. 29 December 2020. live. https://web.archive.org/web/20150917005248/https://lwn.net/Articles/337766/. 2015-09-17.
  16. Web site: OpenSAF Release 4 Overview "The Architecture Release". OpenSAF. 29 December 2020. live. https://web.archive.org/web/20201231004308/https://docs.huihoo.com/opensaf/developer-days-2010/3-OpenSAF-Release-4-Overview.pdf. 31 December 2020.
  17. Web site: OpenSAF 3.0 released. Hans J. Rauscher. WindRiver. 22 June 2009 . 30 December 2020. live. https://web.archive.org/web/20090629211732/http://www.electronicsweekly.com/blogs/open-source-linux/2009/06/opensaf-30-released.html. 2009-06-29.
  18. Web site: OpenSAF Project Releases Major Update to High Availability Middleware. PICMG. 30 December 2020. live. https://web.archive.org/web/20201231011839/http://picmg.mil-embedded.com/news/opensaf-update-high-availability-middleware/. 2020-12-31.
  19. Web site: Announcement of 5.0.0 GA release and 4.7.1, 4.6.2 maintenance releases. sourceforge. 30 December 2020. live. https://web.archive.org/web/20201231014409/https://sourceforge.net/p/opensaf/news/2016/05/announcement-of-50-ga-release-and-471-462-maintenance-releases/?version=4. 2020-12-31.
  20. Web site: SAI-AIS-AMF-B.04.01 Section 3.6. OpenSAF. SA Forum. 2010. 20 December 2020.
  21. Web site: OpenSAF in the Cloud. Why an HA Middleware is still needed. Linuxfoundation Events. Anders Widell. Mathivanan NP. 2012. 24 September 2015.
  22. Web site: TIPC: Providing Communication for Linux Clusters. Linux Kernel.org. Jon Paul Maloy. 2004. Linux Symposium, Volume Two. 31 December 2020. live. https://web.archive.org/web/20170830064121/https://www.kernel.org/doc/ols/2004/ols2004v2-pages-61-70.pdf. 2017-08-30.
  23. Web site: Opensaf. OpenSAF TSC. OPNFV. 2016. en. 2020-12-28. live. https://web.archive.org/web/20201231024608/https://wiki.opnfv.org/display/PROJ/Opensaf. 2020-12-31.
  24. Web site: OpenSAF README. OpenSAF Project. Sourceforge. 2020. en. 2020-12-31.
  25. Web site: A NEW DOMAIN SPECIFIC LANGUAGE FOR GENERATING AND VALIDATING MIDDLEWARE CONFIGURATIONS FOR HIGHLY AVAILABLE APPLICATIONS. Maxime TURENNE. etsmtl.ca. 2015. en. 2020-12-28.
  26. A UML-based domain specific modeling language for service availability management. Pejman Salehi. Abdelwahab Hamou-Lhadj. Maria Toeroe. Ferhat Khendek. doi. Computer Standards & Interfaces, Vol. 44, No. C. Elsevier Science Publishers B. V.. 2016. 10.1016/j.csi.2015.09.009. en. 2020-12-28.
  27. Web site: Scenario Analysis for High Availability in NFV, Section 5.4.2. OPNFV HA Project. OPNFV. 2016. en. 2020-12-31. live. https://web.archive.org/web/20201231192323/https://privatewiki.opnfv.org/_media/releases/brahmaputra/scenario_analysis_for_high_availability_in_nfv.pdf. 2020-12-31.
  28. Web site: OpenSAF IMM README. OpenSAF Project. Sourceforge. 2020. en. 2020-12-31. live. https://web.archive.org/web/20201231134352/https://sourceforge.net/p/opensaf/code/ci/develop/tree/src/imm/README. 2020-12-31.
  29. Web site: JSR 319: Availability Management for Java. Jens Jensen. Expert Group. JCP. 2010. en. 2020-12-31. live. https://web.archive.org/web/20170710122434/https://jcp.org/en/jsr/detail?id=319. 2017-07-10.
  30. Web site: OpenSAF and VMware from the Perspective of High Availability. DMTF. 2013. Ferhat Khendek. en. 2020-12-31. live. https://web.archive.org/web/20150923231930/https://www.dmtf.org/sites/default/files/SVM_2013-Khendek.pdf. 2015-09-23.