Fast automatic restoration (FASTAR) is an automated fast response system developed and deployed by American Telephone & Telegraph (AT&T) in 1992 for the centralized restoration of its digital transport network.[1] FASTAR automatically reroutes circuits over a spare protection capacity when a fiber-optic cable failure is detected, hence increasing service availability and reducing the impact of the outages in the network. Similar in operation is real-time restoration (RTR), developed and deployed by MCI and used in the MCI network to minimize the effects of a fiber cut.[2]
It is a recovery technique used in computer networks and telecommunication networks such as mesh optical networks, where the backup path (the alternate path that affected traffic takes after a failure condition) and backup channel are computed in real time after the occurrence of a failure. This technique can be broadly classified into two: centralized restoration and distributed restoration.[3]
This technique uses a central controller which has access to complete up-to-date and accurate information about the network, the available resources, resources used, the physical topology of the network, the service demands etc. When failure is detected in any part of the network through some failure detection, identification and notification scheme, the central controller calculates a new re-route path around the failure based on the information in its database about the current state of the network. After this new route (backup path) is calculated, the central controller sends out commands to all the affected digital cross-connects to make appropriate reconfigurations to their switching elements in order to implement this new path. FASTAR and RTR restoration systems are examples of systems that use this restoration technique.
In this restoration technique, no central controller is used, hence no up-to-date database of the state of the network is needed. In this scheme, all nodes in the network use local controllers that have only local information about how a particular node is connected to its neighboring nodes, available and spare capacity on the links used to connect to neighbors, and the state of their switching elements. When a failure occurs in any part of the network, the local controllers handle the computation and re-routing of the affected traffic. An example of an approach where this technique is used is the Self-Healing Networks (SHN).
As the transport networks gradually developed from digital cross connect system (DCS)-based mesh networks, to SONET ring networks, and to optical mesh networks over the years, so did the recovery architecture used therein. The recovery architectures used for the different transport networks are: DCS-based mesh networks restoration of DS3 facilities, Add-Drop Multiplexer (ADM)-based ring protection of SONET ring networks, and finally Optical Cross Connect (OXC)-based mixed protection and restoration of optical mesh networks[4]
The first restoration architecture which was used in the 1980s is the DCS-based mesh restoration of DS3 facilities. This architecture used a centralized restoration technique: every restoration event was coordinated from the network operation center (NOC). This restoration architecture is path-based and failure dependent, and is used after a fault occurs, for fault detection and isolation. This architecture is capacity-efficient due to the use of stub release but has a slow failure recovery time (the time it takes to reestablish traffic continuity after a failure by rerouting the signals on diverse facilities) on the order of minutes.
This architecture was implemented in the 1990s with the introduction of the SONET/SDH networks, and employed the distributed protection technique. It uses either path-based (UPSR) or span-based (BLSR) protection, and its recovery path is precomputed before the occurrence of a failure. ADM-based ring protection is capacity-inefficient, unlike the DCS-based mesh restoration, but has a faster recovery time (50 ms).
This recovery architecture is used in the protection of optical mesh networks which was introduced in early 2000s. This protection architecture has a recovery time between tens and hundreds of milliseconds which is a significant improvement over the recovery time supported in DCS-based mesh restoration but unlike the DCS-based mesh restoration, its recovery path is predetermined and pre-provisioned. This architecture also has the capacity efficiency seen in the preceding mesh restoration architecture (DCS-based).
FASTAR uses DCS-based mesh restoration architecture. This architecture consists of nodal equipment, central control equipment, and a data communication network interconnecting the nodes to the central controller. The figure on the right explains the architecture of FASTAR and how the different building blocks interact.
The central processor called the Restoration and Provisioning Integrated Design (RAPID) located at the NOC[5] is responsible for receiving and analyzing alarm reports generated in the event of a fiber failure. it also handles alternate (backup) route computation, re-routing of the affected traffic from the primary path to the already computed backup path, path assurance tests, and enables the roll-back of traffic to the original path after the failure is repaired.[6] The RAPID maintains an up-to-date information about the state of the network and the available spare capacity.[7] The Central Access and Display system (CADS) provides a craft interface for RAPID and other related restoration management systems.
The Traffic Maintenance and Administration System (TMAS) enables RAPID to perform and control the protection switch lock-out process on protection channels being used for restoration, by sending commands to the Line Terminating Equipment (LTE).
The Restoration Network Controllers (RNCs) are located at each central office (CO) in the fiber optic network. The alarms generated by the affected digital access and cross-connect system (DACSs) or from the LTE are sent to the RNC, where it is aged to find out if the alarm is as a result of a transient, correlated and finally sent to the RAPID via the data communication network.
The LTE, which is either FT Series G digital transmission system or an add drop multiplexer (ADM), reports any fiber failure between LTEs to the RNC and also provides RAPID with immediate access to the backup channels for re-routing of traffic or path assurance tests.
The Restoration Test Equipment (RTE) provides RAPID with the means to perform continuity tests used in path assurance.
The DACS is responsible for reporting fiber failures and node failures that occur within the office to the RNC. In addition, the DACS enables automatic restoration by providing the central processor access to remotely perform cross-connects at the DS-3 level.
The data communication network is used to connect the nodal equipments with the central controller. To achieve the needed availability of this network, full redundancy is used in the form of two totally diverse terrestrial and satellite-based networks. In an event of a major restoration process, one of these networks can support the communication burden in the absence of the other.
FASTAR operates at the DS-3 level; it does not restore individual smaller demands.[8] FASTAR restores 90 to 95 percent of the affected DS-3 demand within two to three minutes.[9] When a fiber-optic cut occurs between the output of a DACS equipment and the input of another, each RNC collects alarms from the affected LTEs. The RNC ages these alarms and sends it to RAPID. RAPID determines the amount of spare capacity available after this failure, identifies the DS-3 demands affected, finds the restoration route for each affected traffic in sequential order of priority, and sends a command to the appropriate DACSs to implement the re-route, thus establishing a restoration.
In the figure on the right, a route exists between node A and node Q via nodes C, F, K, and L. In the event of a fiber-optic cable failure between nodes F and K, the LTE (FT Series G or the ADM) in these two offices detects and sends alarm reports for this failure to their respective RNCs. Both RNCs age the alarm and send these reports to RAPID, located at the NOC. RAPID initiates a time window to ensure all related alarms generated from the RNCs of the affected nodes and the RNC of any other office whose traffic uses the F to K failed fiber optic cable. When this window times out, RAPID performs route computation, to establish a new backup path for the traffic between node A and node Q. Here it creates a new route through C, F, G, J, K, and L. This route computation is also done sequentially in order of priority for all the traffic between any two nodes in the network that use the same failed fiber-optic cable. Once the backup path for all the traffic going through nodes F and K has been computed, RAPID ensures that there is continuity or connectivity along the established back-up paths by sending a command to the RNCs located at A and Q, both of which in turn use the test signal generated by their respective RTE to check for continuity in the link. When the connectivity of this backup path has been verified, the traffic between nodes A and Q is transferred to this backup path by commanding the DACS IIIs to make the appropriate cross connections. RAPID performs a service verification test to verify that the service transfer was successful. If this test returns a positive result, then the service transfer was successful, else the service transfer was unsuccessful and needs to be repeated. This service or traffic transfer process is performed for all the traffic going through the affected fiber optic cable F–K.FASTAR restores as much of the affected traffic demand as the available protection capacity will allow.
Shared Risk Link Groups (SRLGs) refer to situations where links that connect two distinct nodes or offices in a network share a common conduit. In that configuration, links in the group have a shared risk: if one link fails, other links in the group may fail too. Majority of the networks in use today use SRLGs, as most times, the only access into a building or across a bridge is only through a single conduit.To restore the traffic in a link between two offices or nodes that share the same SRLG with other links in the event of a conduit cut, at least one of these two offices must be FASTAR-ompliant.[10]
A cut in SRLG1 would be restorable using FASTAR if FASTAR is implemented in either office A or B but B and C were not yet FASTAR-compliant. But given a failure in SRLG2, the DS-3 traffic on link 3 would be restored by FASTAR via a newly re-computed backup path while the DS-3 traffic on link 2 would not be restored as FASTAR is not implemented in either office B or C. To restore all three links in the event of failure of both SRLGs, FASTAR is implemented in offices A and C. A failure in SRLG1 would cause FASTAR to automatically re-route each of the traffic on link 1 and 3 via two re-computed backup paths. Also if at another time failure of SRLG2 is detected, it is reported to RAPID and the traffic through link 2 and 3 are each re-routed through a new backup path.
FASTAR network management is used to integrate and analyze the different data and alarms supplied by the various system elements that make up the FASTAR architecture for centralized display, and to troubleshoot and isolate problems through fault management analysis so that corrective action can be taken. The FASTAR network management cuts across three tiers.