Veritas Cluster Server (rebranded as Veritas Infoscale Availability[1] [2] and also known as VCS and also sold bundled in the SFHA product) is high-availability cluster software for Unix, Linux and Microsoft Windows computer systems, created by Veritas Technologies. It provides application cluster capabilities to systems running other applications, including databases, network file sharing, and electronic commerce websites.
High-availability clusters (HAC) improve application availability by failing or switching them over in a group of systems—as opposed to high-performance clusters, which improve application performance by running them on multiple systems simultaneously.
Most Veritas Cluster Server implementations attempt to build availability into a cluster, eliminating single points of failure by making use of redundant components like multiple network cards, storage area networks in addition to the use of VCS.
Similar products include Fujitsu PRIMECLUSTER, IBM PowerHA System Mirror, HP ServiceGuard, IBM Tivoli System Automation for Multiplatforms (SA MP), Linux-HA, OpenSAF, Microsoft Cluster Server (MSCS), NEC ExpressCluster, Red Hat Cluster Suite, SteelEye LifeKeeper and Sun Cluster.
VCS is mostly user-level clustering software; most of its processes are normal system processes on the systems it operates on, and have no special access to the operating system or kernel functions in the host systems. However, the interconnect (heartbeat) technology used with VCS is a proprietary Layer 2 Ethernet-based protocol that is run in the kernel space using kernel modules.[3] The group membership protocol that runs on top of the interconnect heartbeat protocol is also implemented in the kernel.[3] In case of a split brain, the 'fencing' module does the work of arbitration and data protection. Fencing is also implemented as a kernel module.
The basic architecture of VCS includes low-latency transport (LLT), the Global Membership Services and Atomic Broadcast Protocol (GAB), the High Availability Daemon (HAD), and Cluster Agents.
LLT lies at the bottom of the architecture and acts as conduit between GAB and underlying network. It receives information from GAB and transmits across to intended participant nodes. While LLT module on one node interacts with every other node in the cluster, the communication is always 1:1 between individual nodes. So in case if certain information needs to be transmitted across all cluster nodes assuming a 6 nodes cluster, 6 different packets are sent across targeted to individual machine interconnects.
GAB determines which machines are part of cluster and minimum number of nodes that need to be present and working to form the cluster (this minimum number is called seed number). GAB acts as an abstract layer upon which other cluster services can be plugged in. Each of those cluster services need to register with GAB and is assigned a predetermined unique port name (a single alphabet). GAB has both a client and server component. Client component is used to send information using GAB layer and registers with Server component as Port "a". HAD registers with GAB as port "h". Server portion of GAB interacts with GAB modules on other cluster nodes to maintain membership information with respect to different ports. The membership information conveys if all the cluster modules corresponding to ports (For example GAB (port "a"), HAD (port "h") etc) on different cluster nodes are in good shape and able to communicate in intended manner with each other.
HAD layer is the place where actual high availability for applications are provided. This is the place where applications actually plug into the high availability framework. HAD registers with GAB on port "h". HAD module running on one node communicates with HAD modules running on other cluster nodes in order to ensure all the cluster nodes have same information with respect to cluster configuration and status.
In order for applications to be able to plug into High Availability Framework, it needs Cluster agent software. Cluster Agent softwares can be generic or specific to each type of application. For example, for Oracle to utilize HA (High Availability) framework in VCS, it needs an agent software. VCS at base is generic Cluster software and may not know how different applications start, stop, monitor, clean etc. This information needs to be coded into Agent software. Agent software can be thought of as a translator between application and HA framework. For example, if HAD needs to stop Oracle Database, by default, it will not know how to stop it, however, if it has Oracle DB agent running on it, it will ask Oracle agent to stop database and by definition, agent will issue commands specific to DB version and configuration and monitor the stop status.
Important files where cluster configuration information is kept :
/etc/llttab
, /etc/llthosts
/etc/gabtab
/etc/VRTSvcs/conf/config/main.cf
, /etc/VRTSvcs/conf/config/types.cf
, /etc/VRTSvcs/conf/sysname
Veritas Cluster Server for Windows is available as a standalone product. It is also sold bundled with Storage Foundation as Storage Foundation HA for Windows; Veritas Cluster Server for AIX, HP-UX, Linux, and Solaris is supplied as a standalone product.
The Veritas Cluster Server product includes VCS Management Console, which is multi-cluster management software that automates disaster recovery across data centers.