Path MTU Discovery explained

Path MTU Discovery (PMTUD) is a standardized technique in computer networking for determining the maximum transmission unit (MTU) size on the network path between two Internet Protocol (IP) hosts, usually with the goal of avoiding IP fragmentation. PMTUD was originally intended for routers in Internet Protocol Version 4 (IPv4). However, all modern operating systems use it on endpoints. In IPv6, this function has been explicitly delegated to the end points of a communications session.As an extension to the standard path MTU discovery, a technique called Packetization Layer Path MTU Discovery works without support from ICMP.

Implementation

For IPv4 packets, Path MTU Discovery works by setting the Don't Fragment (DF) flag bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an Internet Control Message Protocol (ICMP) Fragmentation Needed (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its path MTU appropriately. The process is repeated until the MTU is small enough to traverse the entire path without fragmentation.

As IPv6 routers do not fragment packets, there is no Don't Fragment option in the IPv6 header. For IPv6, Path MTU Discovery works by initially assuming the path MTU is the same as the MTU on the link layer interface where the traffic originates. Then, similar to IPv4, any device along the path whose MTU is smaller than the packet will drop the packet and send back an ICMPv6 Packet Too Big (Type 2) message containing its MTU, allowing the source host to reduce its path MTU appropriately. The process is repeated until the MTU is small enough to traverse the entire path without fragmentation.

If the path MTU changes after the connection is set up and becomes lower than the previously determined path MTU, the first large packet will cause an ICMP error and the new, lower path MTU will be found. If the path changes and the new path MTU is larger, the source will not learn about the increase, because all routers along the new path will be capable of relaying all packets that the source sends using the originally determined, lower path MTU.[1] [2] [3]

Problems

Many network security devices block all ICMP messages for perceived security benefits, including the errors that are necessary for the proper operation of PMTUD. This can result in connections that complete the TCP three-way handshake correctly but then hang when attempting to transfer data. This state is referred to as a black hole connection.

Some implementations of PMTUD attempt to circumvent this problem by inferring that large payload packets have been dropped due to MTU rather than link congestion. One such scheme is standardized under RFC 8899, Datagram Packetization Layer Path MTU Discovery (DPLPMTUD). Upon loss of connectivity, DPLPMTUD utilizes probe packets of controlled sizes to probe the MTU of the path. Acknowledgement of a probe packet indicates that the path MTU is at least the size of that packet. Usage of DPLPMTUD is standardized in QUIC. However, in order for transport layer protocols to operate most efficiently, ICMP Unreachable messages (type 3) should still be permitted.

Some routers, including the Linux kernel[4] and Cisco,[5] provide an option to reduce the maximum segment size (MSS) advertised in the TCP handshake as a workaround. This is known as MSS clamping.

Another problem is when networks administrators don't properly update the MTU between 2 adjacent layer 3 hops if the link between these hops is composed of multiple layer 2 segments with switches between them. Usually the MTU on the outgoing L3 interface is taken from the first L2 segment. But if the second or further segment has a lower MTU the switch that is between will just silently drop the packet without reporting back any ICMP (because only layer 3 hops can generate ICMP "packet too big"). So, in this case admins should update the MTU for each outgoing L3 interface to the minimum MTU of the layer 2 segments used until the next L3 hop.

Notes and References

  1. Book: E. Comer, Douglas . 2014 . Internetworking with TCP/IP Volume 1 . 6th . Pearson . 0-13-608530-X . 133–134.
  2. https://github.com/torvalds/linux/blob/v3.15/net/ipv4/route.c#L123 linux source code (ipv4)
  3. Book: Davies, Joseph. Understanding IPv6. 2012. Microsoft Press. 978-0735659148. 3rd. Redmond. 146–147. 810455372.
  4. Web site: Mangling packet headers - nftables wiki . 2024-07-03 . wiki.nftables.org.
  5. Web site: Ethernet MTU and TCP MSS Adjustment Concept for PPPoE Connections . 2024-07-03 . Cisco . en.