The Spanning Tree Protocol (STP) is a network protocol that builds a loop-free logical topology for Ethernet networks. The basic function of STP is to prevent bridge loops and the broadcast radiation that results from them. Spanning tree also allows a network design to include backup links providing fault tolerance if an active link fails.
As the name suggests, STP creates a spanning tree that characterizes the relationship of nodes within a network of connected layer-2 bridges, and disables those links that are not part of the spanning tree, leaving a single active path between any two network nodes. STP is based on an algorithm that was invented by Radia Perlman while she was working for Digital Equipment Corporation.[1] [2]
In 2001, the IEEE introduced Rapid Spanning Tree Protocol (RSTP) as 802.1w. RSTP provides significantly faster recovery in response to network changes or failures, introducing new convergence behaviors and bridge port roles to do this. RSTP was designed to be backwards-compatible with standard STP.
STP was originally standardized as IEEE 802.1D but the functionality of spanning tree (802.1D), rapid spanning tree (802.1w), and multiple spanning tree (802.1s) has since been incorporated into IEEE 802.1Q-2014.[3]
While STP is still in use today, in most modern networks its primary use is as a loop-protection mechanism rather than a fault tolerance mechanism. Link aggregation protocols such as LACP will bond two or more links to provide fault tolerance while simultaneously increasing overall link capacity.
The need for the Spanning Tree Protocol (STP) arose because switches in local area networks (LANs) are often interconnected using redundant links to improve resilience should one connection fail. However, this connection configuration creates a switching loop resulting in broadcast radiations and MAC table instability. If redundant links are used to connect switches, then switching loops need to be avoided.
To avoid the problems associated with redundant links in a switched LAN, STP is implemented on switches to monitor the network topology. Every link between switches, and in particular redundant links, are catalogued. The spanning-tree algorithm then blocks forwarding on redundant links by setting up one preferred link between switches in the LAN. This preferred link is used for all Ethernet frames unless it fails, in which case a non-preferred redundant link is enabled. When implemented in a network, STP designates one layer-2 switch as root bridge. All switches then select their best connection towards the root bridge for forwarding and block other redundant links. All switches constantly communicate with their neighbors in the LAN using (BPDUs).
Provided there is more than one link between two switches, the STP root bridge calculates the cost of each path based on bandwidth. STP will select the path with the lowest cost, that is the highest bandwidth, as the preferred link. STP will enable this preferred link as the only path to be used for Ethernet frames between the two switches, and disable all other possible links by designating the switch ports that connect the preferred path as root port.[4]
After STP enabled switches in a LAN have elected the root bridge, all non-root bridges assign one of their ports as root port. This is either the port that connects the switch to the root bridge, or if there are several paths, the port with the preferred path as calculated by the root bridge. Because not all switches are directly connected to the root bridge they communicate amongst each other using STP BPDUs. Each switch adds the cost of its own path to the cost received from the neighboring switches to determine the total cost of a given path to the root bridge. Once the cost of all possible paths to the root bridge have been added up, each switch assigns a port as root port which connects to the path with the lowest cost, or highest bandwidth, that will eventually lead to the root bridge.[4]
Data rate (link bandwidth) | Original STP cost (802.1D-1998) | RSTP/MSTP cost (recommended value) | |
---|---|---|---|
4 Mbit/s | 250 | 5,000,000 | |
10 Mbit/s | 100 | 2,000,000 | |
16 Mbit/s | 62 | 1,250,000 | |
100 Mbit/s | 19 | 200,000 | |
1 Gbit/s | 4 | 20,000 | |
2 Gbit/s | 3 | 10,000 | |
10 Gbit/s | 2 | 2,000 | |
100 Gbit/s | N/A | 200 | |
1 Tbit/s | N/A | 20 |
The STP path cost default was originally calculated by the formula . When faster speeds became available, the default values were adjusted as otherwise speeds above 1 Gbit/s would have been indistinguishable by STP. Its successor RSTP uses a similar formula with a larger numerator: . These formulas lead to the sample values in the table.[5]
All switch ports in the LAN where STP is enabled are categorized.[4]
When a device is first attached to a switch port, it will not immediately start to forward data. It will instead go through a number of states while it processes BPDUs and determines the topology of the network. The port attached to a host such as a computer, printer or server always goes into the forwarding state, albeit after a delay of about 30 seconds while it goes through the listening and learning states. The time spent in the listening and learning states is determined by a value known as the forward delay (default 15 seconds and set by the root bridge). If another switch is connected, the port may remain in blocking mode if it is determined that it would cause a loop in the network. Topology change notification (TCN) BPDUs are used to inform other switches of port changes. TCNs are injected into the network by a non-root switch and propagated to the root. Upon receipt of the TCN, the root switch will set the topology change flag in its normal BPDUs. This flag is propagated to all other switches and instructs them to rapidly age out their forwarding table entries.
Before configuring STP, the network topology should be carefully planned.[6] Basic configuration requires that STP be enabled on all switches in the LAN and the same version of STP chosen on each. The administrator may determine which switch will be the root bridge and configure the switches appropriately. If the root bridge goes down, the protocol will automatically assign a new root bridge based on bridge ID. If all switches have the same bridge ID, such as the default ID, and the root bridge goes down, a tie situation arises and the protocol will assign one switch as root bridge based on the switch MAC addresses. Once the switches have been assigned a bridge ID and the protocol has chosen the root bridge switch, the best path to the root bridge is calculated based on port cost, path cost and port priority.[7] Ultimately STP calculates the path cost on the basis of the bandwidth of a link, however links between switches may have the same bandwidth. Administrators can influence the protocol's choice of the preferred path by configuring the port cost, the lower the port cost the more likely it is that the protocol will choose the connected link as root port for the preferred path.[8] The selection of how other switches in the topology choose their root port, or the least cost path to the root bridge, can be influenced by the port priority. The highest priority will mean the path will ultimately be less preferred. If all ports of a switch have the same priority, the port with the lowest number is chosen to forward frames.[9]
The root bridge of the spanning tree is the bridge with the smallest (lowest) bridge ID. Each bridge has a configurable priority number and a MAC address; the bridge ID is the concatenation of the bridge priority and the MAC address. For example, the ID of a bridge with priority 32,768 and MAC is . The bridge priority default is 32,768 and can be configured only in multiples of 4096. When comparing two bridge IDs, the priority portions are compared first and the MAC addresses are compared only if the priorities are equal. The switch with the lowest priority of all the switches will be the root; if there is a tie, then the switch with the lowest priority and lowest MAC address will be the root. For example, if switches A (MAC =) and B (MAC =) both have a priority of 32,768 then switch A will be selected as the root bridge. If the network administrators would like switch B to become the root bridge, they must set its priority to be less than 32,768.
The sequence of events to determine the best received BPDU (which is the best path to the root) is:
The above rules describe one way of determining what spanning tree will be computed by the algorithm, but the rules as written require knowledge of the entire network. The bridges have to determine the root bridge and compute the port roles (root, designated, or blocked) with only the information that they have. To ensure that each bridge has enough information, the bridges use special data frames called bridge protocol data units (BPDUs) to exchange information about the spanning tree protocol, bridge IDs, and root path costs.
A bridge sends a BPDU frame using the unique MAC address of the port itself as a source address, and a destination address of the STP multicast address with destination MAC,[12] or for Cisco proprietary Per-VLAN Spanning Tree.[13]
There are two types of BPDUs in the original STP specification (802.1D)[5] (the Rapid Spanning Tree (RSTP) extension uses a specific RSTP BPDU):
BPDUs are exchanged regularly (every 2 seconds by default) and enable switches to keep track of network changes and to start and stop forwarding at ports as required. To prevent the delay when connecting hosts to a switch and during some topology changes, Rapid STP was developed, which allows a switch port to rapidly transition into the forwarding state during these situations.
IEEE 802.1D and IEEE 802.1aq BPDUs have the following format:
1. Protocol ID: 2 bytes (0x0000 IEEE 802.1D) 2. Version ID: 1 byte (0x00 Config & TCN / 0x02 RST / 0x03 MST / 0x04 SPT BPDU) 3. BPDU Type: 1 byte (0x00 STP Config BPDU, 0x80 TCN BPDU, 0x02 RST/MST Config BPDU) 4. Flags: 1 byte bits : usage 1 : 0 or 1 for Topology Change 2 : 0 (unused) or 1 for Proposal in RST/MST/SPT BPDU 3–4 : 00 (unused) or 01 for Port Role Alternate/Backup in RST/MST/SPT BPDU 10 for Port Role Root in RST/MST/SPT BPDU 11 for Port Role Designated in RST/MST/SPT BPDU 5 : 0 (unused) or 1 for Learning in RST/MST/SPT BPDU 6 : 0 (unused) or 1 for Forwarding in RST/MST/SPT BPDU 7 : 0 (unused) or 1 for Agreement in RST/MST/SPT BPDU 8 : 0 or 1 for Topology Change Acknowledgement 5. Root ID: 8 bytes (CIST Root ID in MST/SPT BPDU) bits : usage 1–4 : Root Bridge Priority 5–16 : Root Bridge System ID Extension 17–64 : Root Bridge MAC Address 6. Root Path Cost: 4 bytes (CIST External Path Cost in MST/SPT BPDU) 7. Bridge ID: 8 bytes (CIST Regional Root ID in MST/SPT BPDU) bits : usage 1–4 : Bridge Priority 5–16 : Bridge System ID Extension 17–64 : Bridge MAC Address 8. Port ID: 2 bytes 9. Message Age: 2 bytes in 1/256 secs 10. Max Age: 2 bytes in 1/256 secs 11. Hello Time: 2 bytes in 1/256 secs 12. Forward Delay: 2 bytes in 1/256 secs 13. Version 1 Length: 1 byte (0x00 no ver 1 protocol info present. RST, MST, SPT BPDU only) 14. Version 3 Length: 2 bytes (MST, SPT BPDU only) The TCN BPDU includes fields 1–3 only.
The first spanning tree protocol was invented in 1985 at the Digital Equipment Corporation by Radia Perlman.[1] In 1990, the IEEE published the first standard for the protocol as 802.1D,[14] based on the algorithm designed by Perlman. Subsequent versions were published in 1998[15] and 2004,[16] incorporating various extensions. The original Perlman-inspired Spanning Tree Protocol, called DEC STP, is not a standard and differs from the IEEE version in message format as well as timer settings. Some bridges implement both the IEEE and the DEC versions of the Spanning Tree Protocol, but their interworking can create issues for the network administrator.[17]
Different implementations of a standard are not guaranteed to interoperate, due for example to differences in default timer settings. The IEEE encourages vendors to provide a Protocol Implementation Conformance Statement, declaring which capabilities and options have been implemented,[16] to help users determine whether different implementations will interoperate correctly.
Spanning Tree Protocol should not be confused with Real Time Streaming Protocol.
In 2001, the IEEE introduced Rapid Spanning Tree Protocol (RSTP) as IEEE 802.1w. RSTP was then incorporated into IEEE 802.1D-2004 making the original STP standard obsolete. RSTP was designed to be backward-compatible with standard STP.
RSTP provides significantly faster spanning tree convergence after a topology change, introducing new convergence behaviors and bridge port roles to accomplish this. While STP can take 30 to 50 seconds to respond to a topology change, RSTP is typically able to respond to changes within 3 × hello times (default: 3 2 seconds) or within a few milliseconds of a physical link failure. The hello time is an important and configurable time interval that is used by RSTP for several purposes; its default value is 2 seconds.[18] [19]
RSTP adds new bridge port roles in order to speed convergence following a link failure:
The number of switch port states a port can be in has been reduced to three instead of STP's original five:
RSTP operational details:
STP and RSTP do not segregate switch ports by VLAN.[20] However, in Ethernet switched environments where multiple VLANs exist, it is often desirable to create multiple spanning trees so that traffic on different VLANs uses different links.
Before the IEEE published a Spanning Tree Protocol standard for VLANs, a number of vendors who sold VLAN capable switches developed their own Spanning Tree Protocol versions that were VLAN capable. Cisco developed, implemented and published the Per-VLAN Spanning Tree (PVST) proprietary protocol using its own proprietary Inter-Switch Link (ISL) for VLAN encapsulation, and PVST+ which uses 802.1Q VLAN encapsulation. Both standards implement a separate spanning tree for every VLAN. Cisco switches now commonly implement PVST+ and can only implement Spanning Trees for VLANs if the other switches in the LAN implement the same VLAN STP protocol. HP provides PVST and PVST+ compatibility in some of its network switches.[20] Some devices from Force10 Networks, Alcatel-Lucent, Extreme Networks, Avaya, Brocade Communications Systems and BLADE Network Technologies support PVST+.[21] [22] [23] Extreme Networks does so with two limitations: Lack of support on ports where the VLAN is untagged/native, and also on the VLAN with ID 1. PVST+ can tunnel across an MSTP Region.[24]
The switch vendor Juniper Networks in turn developed and implemented its VLAN Spanning Tree Protocol (VSTP) to provide compatibility with Cisco's PVST, so that the switches from both vendors can be included in one LAN.[20] The VSTP protocol is only supported by the EX and MX Series from Juniper Networks. There are two restrictions to the compatibility of VSTP:
By default, VSTP uses the RSTP protocol as its core spanning-tree protocol, but usage of STP can be forced if the network includes old bridges.[26] More information about configuring VSTP on Juniper Networks switches was published in the official documentation.[27]
Cisco also published a proprietary version of Rapid Spanning Tree Protocol. It creates a spanning tree for each VLAN, just like PVST. Cisco refers to this as Rapid Per-VLAN Spanning Tree (RPVST).
See main article: Multiple Spanning Tree Protocol.
The Multiple Spanning Tree Protocol (MSTP), originally defined in IEEE 802.1s-2002 and later merged into IEEE 802.1Q-2005, defines an extension to RSTP to further develop the usefulness of VLANs.
In the standard, a spanning tree that maps one or more VLANs is called a multiple spanning tree (MST). Under MSTP, a spanning tree can be defined for individual VLANs or for groups of VLANs. Furthermore, the administrator can define alternate paths within a spanning tree. Switches are first assigned to an MST region, then VLANs are mapped against or assigned to this MST. A common spanning tree (CST) is an MST to which several VLANs are mapped, this group of VLANs is called MST instance (MSTI). CSTs are backward compatible with the STP and RSTP standard. A MST that has only one VLAN assigned to it is an internal spanning tree (IST).[20]
Unlike some proprietary per-VLAN spanning tree implementations,[28] MSTP includes all of its spanning tree information in a single BPDU format. Not only does this reduce the number of BPDUs required to communicate spanning tree information for each VLAN, but it also ensures backward compatibility with RSTP and, in effect, classic STP too. MSTP does this by encoding an additional region of information after the standard RSTP BPDU as well as a number of MSTI messages (from 0 to 64 instances, although in practice many bridges support fewer). Each of these MSTI configuration messages conveys the spanning tree information for each instance. Each instance can be assigned a number of configured VLANs and frames assigned to these VLANs operate in this spanning tree instance whenever they are inside the MST region. In order to avoid conveying their entire VLAN to spanning tree mapping in each BPDU, bridges encode an MD5 digest of their VLAN to instance table in the MSTP BPDU. This digest is then used by other MSTP bridges, along with other administratively configured values, to determine if the neighboring bridge is in the same MST region as itself.
MSTP is fully compatible with RSTP bridges in that an MSTP BPDU can be interpreted by an RSTP bridge as an RSTP BPDU. This not only allows compatibility with RSTP bridges without configuration changes but also causes any RSTP bridges outside of an MSTP region to see the region as a single RSTP bridge regardless of the number of MSTP bridges inside the region itself. In order to further facilitate this view of an MSTP region as a single RSTP bridge, the MSTP protocol uses a variable known as remaining hops as a time to live counter instead of the message age timer used by RSTP. The message age time is only incremented once when spanning-tree information enters an MST region, and therefore RSTP bridges will see a region as only one hop in the spanning tree. Ports at the edge of an MSTP region connected to either an RSTP or STP bridge or an endpoint are known as boundary ports. As in RSTP, these ports can be configured as edge ports to facilitate rapid changes to the forwarding state when connected to endpoints.
See main article: IEEE 802.1aq.
IEEE 802.1aq, also known as Shortest Path Bridging (SPB), allows redundant links between switches to be active through multiple equal cost paths, and provides much larger layer-2 topologies, faster convergence, and improves the use of the mesh topologies through increased bandwidth between all devices by allowing traffic to load share across all paths on a mesh network.[29] [30] SPB consolidates multiple existing functionalities, including Spanning Tree Protocol (STP), Multiple Spanning Tree Protocol (MSTP), Rapid Spanning Tree Protocol (RSTP), Link aggregation, and Multiple MAC Registration Protocol (MMRP) into a one link state protocol.[31]
The bridge ID (BID) is a field inside a BPDU packet. It is eight bytes in length. The first two bytes are the bridge priority, an unsigned integer of 0–65,535. The last six bytes are a MAC address supplied by the bridge. Prior to IEEE 802.1D-2004, the first two bytes gave a 16-bit bridge priority. Since IEEE 802.1D-2004, the first four bits are a configurable priority, and the last twelve bits carry the bridge system ID extension. In the case of MST, the bridge system ID extension carries the MSTP instance number. Some vendors set the bridge system ID extension to carry a VLAN ID allowing a different spanning tree per VLAN, such as Cisco's PVST.
Spanning tree is an older protocol with a longer convergence time. Improper use or implementation can contribute to network disruptions. Blocking links is a crude approach to high availability and preventing loops. Modern networks can make use of all connected links by use of protocols that inhibit, control or suppress the natural behavior of logical or physical topology loops.
Newer, more robust protocols include the TRILL (Transparent Interconnection of Lots of Links) protocol, also created by Perlman,[32] and Shortest Path Bridging from the IEEE.
Configuring connections between network equipment as layer-3 IP links and relying on IP routing for resiliency and to prevent loops is a popular alternative.
Switch virtualization techniques like Cisco Virtual Switching System and Virtual PortChannel and HP Intelligent Resilient Framework combine multiple switches into a single logical entity. Such a multi-chassis link aggregation group works like a normal port trunk, only distributed through multiple switches. Conversely, partitioning technologies compartmentalize a single physical chassis into multiple logical entities.
On the edge of the network, loop-detection is configured to prevent accidental loops by users.