IPv6: The Next Generation Internet Protocol
Gary C. Kessler
An edited version of an early draft this paper appeared in the Handbook on Local Area Networks, published by Auerbach in August 1997. © Auerbach, 1997. Various additions have been made over time...
The Internet is historically linked to the ARPANET, the pioneering packet switched network built for the U.S. Department of Defense in 1969. Starting with four nodes that year, the ARPANET slowly grew to encompass many systems across the U.S., and connecting to hosts in Europe and Asia by the end of the 1970s. By the early 1980s, there were many regional and national networks across the globe that started to become interconnected, and their common communications protocols were based on the Transmission Control Protocol/Internet Protocol (TCP/IP) suite. But even by the later-1980s, the number of host systems on these primarily academic and R&D networks could be counted in the hundreds or thousands. In addition, most of the traffic was supporting simple text-based applications, such as electronic-mail, file transfers, and remote login.
By the 1990s, however, commercial users discovered the Internet and commercial use, previously prohibited or constrained on the Internet, was actively encouraged. Since the beginning of this decade, new host systems are being added to the Internet at rates of up to 10% per month, and the Internet has been doubling in size every 10-12 months for several years; by January 1997, the number of hosts on the Internet was over 16 million, ranging from PC-class systems to supercomputers, on more than 100,000 networks worldwide.
The number of connected hosts is but one measure of the Internet's growth. Another way to quantify the change, however, is in the changing applications. On today's Internet, it is common to see hypermedia, audio, video, animation, and other types of traffic that were once thought to be anathema to a packet switching environment. As the Internet provides better type-of-service support, new applications will spark even more growth and changing demographics. In addition, nomadic access has become a major issue with the increased use of laptop computers, and security concerns have growth with the increased amount of sensitive information accessible via the Internet and the number of attacks launched from the Net.
IPv6 BACKGROUND AND FEATURE OVERVIEW
The Internet Protocol was introduced in the ARPANET in the mid-1970s. The version of IP in common use today is IP version 4 (IPv4), described in Request for Comments (RFC) 791 (September 1981). Although several protocol suites (including Open System Interconnection) have been proposed over the years to replace IPv4, none have succeeded because of IPv4's large, and continually growing, installed base. Nevertheless, IPv4 was never intended for the Internet that we have today, either in terms of the number of hosts, types of applications, or security concerns.
In the early 1990s, the Internet Engineering Task Force (IETF) recognized that the only way to cope with these changes was to design a new version of IP to become the successor to IPv4. The IETF formed the IP next generation (IPng) Working Group to define this transitional protocol to ensure long-term compatibility between the current and new IP versions, and support for current and emerging IP-based applications.
Work started on IPng in 1991 and several IPng proposeals were subsequently drafted. The result of this effort was IP version 6 (IPv6), described in RFCs 1883-1886; these four RFCs were officially entered into the Internet Standards Track in December 1995.1
IPv6 is designed as an evolution from IPv4 rather than as a radical change. Useful features of IPv4 were carried over in IPv6 and less useful features were dropped. According to the IPv6 specification, the changes from IPv4 to IPv6 fall primarily into the following categories:
- Expanded Addressing Capabilities: The IP address size is increased from 32 bits to 128 bits in IPv6, supporting a much greater number of addressable nodes, more levels of addressing hierarchy, and simpler autoconfiguration of addresses for remote users. The scalability of multicast routing is improved by adding a Scope field to multicast addresses. A new type of address, called anycast, is also defined.
- Header Format Simplification: Some IPv4 header fields have been dropped or made optional to reduce the necessary amount of packet processing and to limit the bandwidth cost of the IPv6 header.
- Improved Support for Extensions and Options: IPv6 header options are encoded in such a way to allow for more efficient forwarding, less stringent limits on the length of options, and greater flexibility for introducing new options in the future. Some fields of an IPv4 header have been made optional in IPv6.
- Flow Labeling Capability: A new quality-of-service (QOS) capability has been added to enable the labeling of packets belonging to particular traffic "flows" for which the sender requests special handling, such as real-time service.
- Authentication and Privacy Capabilities: Extensions to support security options, such as authentication, data integrity, and data confidentiality, are built-in to IPv6.
IPv6 also introduces and formalizes terminology that, in the IPv4 environment, are loosely defined, ill-defined, or undefined. The new and improved terminology includes:
- Packet: An IPv6 protocol data unit (PDU), comprising a header and the associated payload. In IPv4, this would have been termed packet or datagram.
- Node: A device that implements IPv6.
- Router: An IPv6 node that forwards packets, based on the IP address, not explicitly addressed to itself. In former TCP/IP terminology, this device was often referred to as a gateway.
- Host: Any node that is not a router; these are typically end-user systems.
- Link: A medium over which nodes communicate with each other at the Data Link Layer (such as an ATM, frame relay, or SMDS wide area network, or an Ethernet or token ring LAN).
- Neighbors: Nodes attached to the same link.
IPv6 HEADER FORMAT
The format of an IPv6 header is shown in Figure 1. Note that although IPv6 addresses are four times the size of IPv4 addresses, the basic IPv6 header is only twice the size of an IPv4 header, thus decreasing the impact of the larger address fields. The fields of the IPv6 header are:
- Version: IP version number (4 bits). This field's value is 6 for IPv6 (and 4 for IPv4). Note that this field is in the same location as the Version field in the IPv4 header, making it simple for an IP node to quickly distinguish an IPv4 packet from an IPv6 packet.
- Priority: Enables a source to identify the desired delivery priority of this packet (4 bits).
- Flow Label: Used by a source to identify associated packets needing the same type of special handling, such as a real-time service between a pair of hosts (24 bits).
- Payload Length: Length of the payload (the portion of the packet following the header), in octets (16 bits). The maximum value in this field is 65,535; if this field contains zero, it means that the packet contains a payload larger than 64KB and the actual payload length value is carried in a Jumbo Payload hop-by-hop option.
- Next Header: Identifies the type of header immediately following the IPv6 header; uses the same values as the IPv4 Protocol field, where applicable (8 bits). The Next Header field can indicate an options header, higher layer protocol, or no protocol above IP. Sample values are listed in Table 1.
- Hop Limit: Specifies the maximum number of hops that a packet may take before it is discarded (8 bits). This value is set by the source and decremented by 1 by each node that forwards the packet; the packet is discarded if the Hop Limit reaches zero. The comparable field in IPv4 is the Time to Live (TTL) field; it was renamed for IPv6 because the value limits the number of hops, not the amount of time that a packet can stay in the network.
- Source Address: IPv6 address of the originator of the packet (128 bits).
- Destination Address: IPv6 address of the intended recipient(s) of the packet (128 bits).
|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Prio. | Flow Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Length | Next Header | Hop Limit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Source Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Destination Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ FIGURE 1. IPv6 Header Format (from RFC 1883).|
|Value||Contents of the next header|
|1||Internet Control Message Protocol (ICMP)|
|6||Transmission Control Protocol (TCP)|
|17||User Datagram Protocol (UDP)|
|58||Internet Control Message Protocol version 6 (ICMPv6)|
|59||nothing; this is the final header|
|60||Destination Options header|
|89||Open Shortest Path First (OSPF)|
To accommodate almost unlimited growth, and a variety of addressing formats, IPv6 addresses are 128 bits in length. This address space is probably sufficient to uniquely address every molecule in the solar system!
IPv6 defines three types of addresses. A unicast address specifies a single host. An anycast address is one that is assigned to more than a single interface, usually belonging to different IPv6 nodes, such as a set of routers belonging to an ISP. A packet sent to an anycast address is delivered to one of the routers identified by that address, usually the "closest" one as defined by the routing protocol. A multicast address also identifies a set of hosts; a packet sent to a multicast address is delivered to all of the hosts in the group. Note that there is no broadcast address in IPv6 as in IPv4, since that function is provided by multicast addresses.
IPv4 addresses are written in dotted decimal notation, where the decimal value of each of the four address bytes is separated by dots. The preferred, or regular, form of an IPv6 address is to write the hexadecimal value of the eight 16-bit blocks of the address, separated by colons (:), such as FF04:19:5:ABD4:187:2C:754:2B1. Note that leading zeros do not have to be written and that each field must have some value.
IPv6 addresses will often contain long strings of zeros because of the way in which addresses are allocated. A shorthand, or compressed, address form uses a double colon (::) to indicate multiple 16-bit blocks of zeros; for example, the address FF01:0:0:0:0:0:0:5A could be written as FF01::5A. To avoid ambiguity, the "::" can only appear once in an address.
Finally, an alternative, hybrid address format has been defined to make it more convenient to represent an IPv4 address in an IPv6 environment. In this scheme, the first 96 address bits (six groups of 16) are represented in the regular IPv6 format and the remaining 32 address bits are represented in common IPv4 dotted decimal; for example, 0:0:0:0:0:0:188.8.131.52 (or ::184.108.40.206).
|Reserved for NSAP Allocation||0000 001||1/128|
|Reserved for IPX Allocation||0000 010||1/128|
|Provider-Based Unicast Address||010||1/8|
|Reserved for Geographic-Based|
|Unassigned||1111 1110 0||1/512|
|Link Local Use Addresses||1111 1110 10||1/1024|
|Site Local Use Addresses||1111 1110 11||1/1024|
|Multicast Addresses||1111 1111||1/256|
One of the goals of the IPv6 address format is to accommodate many different types of addresses. The beginning of an address contains a three- to ten-bit Format Prefix defining the general address type (Table 2); the remaining bits contain the actual host address, in a format specific to the indicated address type.
|| 3 | 5 bits | n bits | 56-n bits | 64 bits | +---+------------+------------+--------------+------------------+ |010| RegistryID | ProviderID | SubscriberID | Intra-Subscriber | +---+------------+------------+--------------+------------------+ FIGURE 2. Provider-Based Unicast Address Format (from RFC 1884).|
For example, the Provider-Based Unicast Address is an IPv6 address that might be assigned by an Internet service provider (ISP) to a customer. This type of address contains a number of subfields, including (Figure 2):
- Format Prefix: Indicates type of address as Provider-Based Unicast. Always 3 bits, coded "010."
- Registry Identifier: Identifies the Internet address registry from which this ISP obtains addresses. A 5-bit value indicating the IANA Internet Assigned Number Authority or one of the three Regional Registries, namely the Internet Network Information Center (InterNIC), Rèseaux IP Europèens Network Coordination Center (RIPE NCC), or Asia-Pacific Network Information Center (APNIC). In the future, national registries may also be accommodated.
- Provider Identifier: Identifies the ISP; this field contains the address block assigned to this ISP by the address registry authority.
- Subscriber Identifier: Identifies the ISP's subscriber; this field contains the address assigned to this subscriber by the ISP. The ProviderID and SubscriberID fields together are 56 bits in length.
- Intra-Subscriber: Contains the portion of the address assigned and managed by the subscriber. A 64-bit value, suggested to comprise a 16-bit subnetwork identifier and a 48-bit interface identifier (such as an IEEE MAC address).
Another particularly important address type is the one that indicates an IPv4 address. With over sixteen million hosts using 32-bit addresses, the public Internet must continue to accommodate IPv4 addresses even as it slowly migrates to IPv6 and IPv6 addressing,
IPv4 addresses are carried in a 128-bit IPv6 address that begins with 80 zeros (0:0:0:0:0). The next 16-bit block contains the compatibility bits, which indicate the way in which the host/router handles IPv4 and IPv6 addresses. If the device can handle either IPv4 or IPv6 addresses, the compatibility bits are all set to zero (0) and this is termed an IPv4-compatible IPv6 address; if the address represents an IPv4-only node, the compatibility bits are all set to one (0xFFFF) and the address is termed an IPv4-mapped IPv6 address. The final 32 bits contain a 32-bit IPv4 address in dotted decimal form.
Finally, IPv6 multicast addresses provide an identifier for a group of nodes. A node may belong to any number of multicast groups. Multicast addresses may not be used as a source address in IPv6 packets or appear in any routing header.
|| 8 | 4 | 4 | 112 bits | +--------+----+----+--------------------------------------------+ |11111111|flgs|scop| group ID | +--------+----+----+--------------------------------------------+ FIGURE 3. Multicast Address Format (from RFC 1884).|
All multicast addresses, as shown in Figure 3, begin with eight ones (0xFF). The next four bits are a set of flag bits (flgs); the three high-order bits are set to zero and the fourth bit (T-bit) indicates a permanently assigned ("well-known") multicast address (T=0) or a nonpermanently assigned ("transient") multicast address (T=1). The next four bits indicate the scope of the address (scop), or the part of the network for which this multicast address is relevant; options include node-local (0x1), link-local (0x2), site-local (0x5), organization-local (0x8), or global (0xE). The remaining 112 bits are the Group Identifier, which identifies the multicast group, either permanent or transient, within the given scope. The interpretation of a permanently assigned multicast address is independent of the scope value; for example, if the "Internet video servers group" is assigned a permanent multicast address with a group identifier of 0x77, then:
- FF01:0:0:0:0:0:0:77 would refer to all video servers on the same node as the sender.
- FF02:0:0:0:0:0:0:77 would refer to all video servers on the same link as the sender.
- FF05:0:0:0:0:0:0:77 would refer to all video servers at the same site as the sender.
- FF0E:0:0:0:0:0:0:77 would refer to all video servers in the Internet.
Finally, a number of well-known multicast addresses are predefined, including:
- Reserved Multicast Addresses are reserved and are never assigned to any multicast group. These addresses have the form FF0x:0:0:0:0:0:0:0, where x is any hexadecimal digit.
- All Nodes Addresses identify the group of all IPv6 nodes within the given scope. These addresses are of the form FF0t:0:0:0:0:0:0:1, where t =1 (node-local) or 2 (link-local).
- All Routers Addresses identify the group of all IPv6 routers within the given scope. These addresses are of the form FF0t:0:0:0:0:0:0:2, where t =1 (node-local) or 2 (link-local).
- The DHCP Server/Relay-Agent address identifies the group of all IPv6 Dynamic Host Configuration Protocol (DHCP) Servers and Relay Agents with the link-local scope; this address is FF02:0:0:0:0:0:0:C.
IPv6 EXTENSION HEADERS AND OPTIONS
In IPv6, optional IP Layer information is encoded in separate extension headers that are placed between the IPv6 basic header and the higher layer protocol header. An IPv6 packet may carry zero, one, or more such extension headers, each identified by the Next Header field of the preceding header and each containing an even multiple of 64 bits (Figure 4). A fully compliant implementation of IPv6 includes support for the following extension headers and corresponding options:
- The Hop-by-Hop Options header is used to carry information that must be examined by every node along a packet's path. Three options are included in this category. The Pad1 option is used to insert a single octet of padding into the Options area of a header for 64-bit alignment, while the PadN option is used to insert two or more octets of padding. The Jumbo Payload option is used to indicate the length of the packet when the payload portion is longer than 65,535 octets (this option is employed when the Payload Length field is set to zero).
- The Routing header is used by an IPv6 source to list one or more intermediate nodes that must be visited as part of the packet's path to the destination; this option is functionally similar to IPv4's Loose and Strict Source Route options. This header contains a list of addresses and an indication of whether each address is strict or loose. If an address is marked strict, it means that this node must be a neighbor of the previously addressed node; if an address is marked loose, this node does not have to be a neighbor of the previous node.
- The Fragment header is used by an IPv6 source to send packets that are larger than the maximum transmission unit (MTU) on the path to the destination. This header will contain a packet identifier, fragment offset, and final fragment indicator. Unlike IPv4, where fragmentation information is carried in every packet header, IPv6 only carries fragmentation/reassembly information in those packets that are fragmented. In another departure from IPv4, fragmentation in IPv6 is performed only by the source and not by the routers along a packet's path. All IPv6 hosts and routers must support an MTU of 576 octets; it is recommended that path MTU discovery procedures (per RFC 1981) be invoked to discover, and take advantage of, those paths with a larger MTU.
- The Destination Options header is used to carry optional information that has to be examined only by a packet's destination node(s). The only destination options defined so far are Pad1 and PadN, as described above.
- The IP Authentication Header (AH) and IP Encapsulating Security Payload (ESP) are IPv6 security mechanisms (see IPv6 Security below).
| <--- 32 bits ---> <--- 32 bits ---> <--- 32 bits ---> +---------------+ +---------------+ +---------------+ | IPv6 header | | IPv6 header | | IPv6 header | | | | | | | | Next Header = | | Next Header = | | Next Header = | | TCP | | Routing | | Routing | +---------------+ +---------------+ +---------------+ | TCP header | |Routing header | |Routing header | | + | | | | | | data | | Next Header = | | Next Header = | | | | TCP | | Fragment | +---------------+ +---------------+ +---------------+ | TCP header | |Fragment header| | + | | | | data | | Next Header = | | | | TCP | +---------------+ +---------------+ | TCP header | | + | | data | | | +---------------+ FIGURE 4. IPv6 Extension Header examples. TCP segment encapsulated|
in IP without additional options (left); TCP segment following a Routing
header (middle); and TCP segment fragment following a Fragment header
following a Routing header (right) (inspired from RFC 1883).
With the exception of the Hop-by-Hop Option, extension headers are only examined or processed by the intended destination node(s). The contents of each extension header determine whether or not to proceed to the next header and, therefore, extension headers must be processed in the order they appear in the packet.
IPv6 QUALITY-OF-SERVICE PARAMETERS
The Priority and Flow Label fields in the IPv6 header are used by a source to identify packets needing special handling by network routers. The concept of a flow in IP is a major departure from IPv4 and most other connectionless protocols; some have called flows a form of connectionless virtual circuits since all packets with the same flow label are treated similarly and the network views them as associated entities.
Special handling for non-default quality-of-service is an important capability in order to support applications that require guaranteed throughput, end-to-end delay, and/or jitter, such as multimedia or real-time communication. These QOS parameters are an extension of IPv4's Type of Service (TOS) capability.
The Priority field allows the source to identify the desired priority of a packet. Values 0-7 are used for congestion-controlled traffic, or traffic that backs off in response to network congestion, such as TCP segments. For this type of traffic, the following priority values are recommended:
- 0) Uncharacterized traffic
1) "Filler" traffic (e.g., Netnews)
2) Unattended data transfer (e.g., e-mail)
4) Attended bulk transfer (e.g., FTP, HTTP, NFS)
6) Interactive traffic (e.g., Telnet, X)
7) Internet control traffic (e.g., routing protocols, SNMP)
Values 8-15 are defined for noncongestion-controlled traffic, or traffic that does not back off in response to network congestion, such as real-time packets being sent at a constant rate. For this type of traffic, the lowest priority value (8) should be used for packets that the sender is most willing to have discarded under congestion conditions (e.g., high-fidelity video traffic) and the highest value (15) should be used for those packets that the sender is least willing to have discarded (e.g., low-fidelity audio traffic).2
The Flow Label is used by a source to identify packets needing nondefault QOS. The nature of the special handling might be conveyed to the network routers by a control protocol, such as the Resource Reservation Protocol (RSVP), or by information within the flow packets themselves, such as a hop-by-hop option. There may be multiple active flows from a source to a destination, as well as traffic that is not associated with any flow (i.e., Flow Label = 0). A flow is uniquely identified by the combination of a source address and a nonzero flow label. This aspect of IPv6 is still in the experimental stage and future definition is expected.
In the early days of TCP/IP, the ARPANET user community was small and close, and security mechanisms were not of primary concern. As the number of TCP/IP hosts grew, and the user community became one of strangers (some nefarious) rather than friends, security became more important. As critical and sensitive data travels on today's Internet, security is of paramount concern.
Although many of today's TCP/IP applications have their own security mechanisms, many would argue that security should be implemented at the lowest possible protocol layer. IPv4 had few, if any, security mechanisms, and authentication and privacy mechanisms at lower protocol layers is largely absent. IPv6 builds two security schemes into the basic protocol.
The first mechanism is the IP Authentication Header (RFC 1826), an extension header that can provide integrity3 and authentication4 for IP packets. Although many different authentication techniques will be supported, use of the keyed Message Digest 5 (MD5, described in RFC 1321) algorithm is required to ensure interoperability. Use of this option can eliminate a large number of network attacks, such as IP address spoofing; this will also be an important addition to overcoming some of the security weaknesses of IP source routing. IPv4 provides no host authentication; all IPv4 can do is to supply the sending host's address as advertised by the sending host in the IP datagram. Placing host authentication information at the Internet Layer in IPv6 provides significant protection to higher layer protocols and services that currently lack meaningful authentication processes.
The second mechanism is the IP Encapsulating Security Payload (ESP, described in RFC 1827), an extension header that can provide integrity and confidentiality5 for IP packets. Although the ESP definition is algorithm-independent, the Data Encryption Standard using cipher block chaining mode (DES-CBC) is specified as the standard encryption scheme to ensure interoperability. The ESP mechanism can be used to encrypt an entire IP packet (tunnel-mode ESP) or just the higher layer portion of the payload (transport-mode ESP).
These features will add to the secure nature of IP traffic while actually reducing the security effort; authentication performed on an end-to-end basis during session establishment will provide more secure communications even in the absence of firewall routers. Some have suggested that the need for firewalls will be obviated by widespread use of IPv6, although there is no evidence to that effect yet.
The Internet Control Message Protocol (ICMP) provides error and information messages that are beyond the scope of IP. ICMP for IPv6 (ICMPv6) is functionally similar to ICMP for IPv4 and uses a similar message format, and forms an integral part of IPv6. ICMPv6 messages are carried in an IPv6 datagram with a Next Header field value of 58.
ICMPv6 error messages are:
- Destination Unreachable: Sent when a packet cannot be delivered to its destination address for reasons other than congestion
- Packet Too Big: Sent by a router when it has a packet that it cannot forward because the packet is larger than the MTU of the outgoing link
- Time Exceeded: Sent by a router that when the packet's Hop Limit reaches zero or if all fragments of a datagram are not received within the fragment reassembly time
- Parameter Problem: Sent by a node that finds some problem in a field in the packet header that results in an inability to process the header).
ICMPv6 informational messages are Echo Request and Echo Reply (used by IPv6 nodes for diagnostic purposes), as well as Group Membership Query, Group Membership Report, and Group Membership Reduction (all used to convey information about multicast group membership from nodes to their neighboring routers).
MIGRATION TO IPv6
The transition to IPv6 has already started even though most Internet and TCP/IP users have not yet seen new software on their local systems nor local networks. Before IPv6 can be widely deployed, the network infrastructure must be upgraded to employ software that accommodates the new protocol.
In addition, the new address format must be accommodated by every TCP/IP protocol that uses addresses. The Domain Name System (DNS), for example, has defined an AAAA resource record for IPv6 128-bit addresses (IPv4's 32-bit addresses use an A record) and the IP6.INT address domain (IPv4 uses the ARPA address domain). Other protocols that must be modified for IPv6 include DHCP, the Address Resolution Protocol (ARP) family, and IP routing protocols such as the Routing Information Protocol (RIP), Open Shortest Path First (OSPF) protocol, and the Border Gateway Protocol (BGP). Only after the routers and the backbones are upgraded will hosts start to transition to the new protocol and applications be modified to take advantage of IPv6's capabilities.
When IPv4 became the official ARPANET standard in 1983, use of previous protocols ceased and there was no planned interoperability between the old and the new. This will not can not be the case with the introduction of IPv6. Although IPv6 trials started in 1996 and initial rollout of IPv6 in the Internet backbone is expected in 1997, there is no scheduled or desired date of a flash cut from one to the other; coexistence of IPv4 and IPv6 is anticipated for many years to come. The simple fact of the sheer number of hosts using IPv4 today suggests that no other policy even begins to make sense. While IPv6 will appear in the large ISP backbones sooner rather than later, some smaller service providers and local network administrators will not make the conversion quickly unless they perceive some benefit from IPv6.
|(-----) ------------- (-----) ------------- (-----) ( IPv6 ) | IPv4/IPv6 | ( IPv4 ) | IPv4/IPv6 | ( IPv6 ) ( network )---| router |---( network )---| router |---( network ) ( ) ------------- ( ) ------------- ( ) (-----) (-----) (-----) FIGURE 5. Common short-term scenario where an IPv4 network interconnects IPv6 networks.|
The coexistence of IPv4 and IPv6 in the network means that different protocols and procedures will need to be accommodated. In one common short-term scenario, IPv6 networks will be interconnected via an IPv4 backbone (Figure 5). The boundary routers will be IPv4-compatible IPv6 nodes and the routers' interfaces will be given IPv4-compatible IPv6 addresses. The IPv6 packet is transported over the IPv4 network by encapsulating the packet in an IPv4 header; this process is called tunneling. Tunneling can also be performed when an organization has converted a part of its subnet to IPv6. This process can be used on host-host, router-router, or host-router links.
Although the introduction of IPv6 is inevitable, many of the market pressures for its development have been somewhat obviated because of parallel developments that enhance the capabilities of IPv4. The address limitations of IPv4, for example, are minimized by use of Classless Interdomain Routing (CIDR). Nomadic user address allocation can be managed by the Dynamic Host Configuration Protocol (DHCP). Quality of service management can be handled by the Resource Reservation Protocol (RSVP). And the IP Authentication Header and Encapsulating Security Payload procedures can be applied to IPv4 as well as IPv6.
This is not meant to suggest that IP vendors are waiting. IPv6 has already started to appear in real products and production networks. Support for IPv6 on several versions of UNIX have been announced by such organizations as Digital Equipment Corporation, IBM, INRIA (Institut National De Recherche En Informatique Et En Automatique, or The French National Institute for Research in Computer Science and Control), Japan's WIDE Project, Sun Microsystems, the Swedish Institute of Computer Science (SICS), and the U.S. Naval Research Laboratory. Other companies have announced support for IPv6 in other operating environments, including Apple Computer (MacOS), FTP Software (DOS/Windows), Mentat (STREAMS), Novell (NetWare), Pacific Softworks, and Siemens Nixdorf (BS2000). Major router vendors that have announced support for IPv6 include Bay Networks, Cisco Systems, Digital Equipment Corporation, Ipsilon Networks, Penril Datability Networks, and Telebit Communications.
One of the important proving grounds of IPv6 is the 6bone, a testbed network spanning North America, Europe, and Japan that began operation in 1996. The 6bone is a virtual network built on top of portions of today's IPv4-based Internet, designed specifically to route IPv6 packets. The goal of this collaborative trial is to test IPv6 implementations and to define early policies and procedures that will be necessary to support IPv6 in the future. In addition, it will demonstrate IPv6's new capabilities and will provide a basis for user confidence in the new protocol.
For most users, the transition from IPv4 to IPv6 will occur when the version of their host's operating system software is updated; in some cases, it will mean running dual-stacked systems with both versions of IP. For larger user networks, it may make sense to follow the model of the larger global Internet; in particular, pre-design the IPv6 network topology and addressing scheme, build a testbed IPv6 network with routers and a DNS, and then slowly migrate applications, users, and subnetworks to the new backbone. The lessons learned from the 6bone activity will be useful for individual networks as well as the Internet backbone.
REFERENCES AND FURTHER READING
Bradner, S. and A. Mankin. IPng: Internet Protocol Next Generation. Reading (MA): Addison-Wesley, 1996.
Deering, S. "SIP: Simple Internet Protocol." IEEE Network, May 1993.
Feit, S. TCP/IP: Architecture, Protocols, and Implementation with IPv6 and IP Security, 2nd ed. New York: McGraw-Hill, 1997.
Francis, P. "A Near-Term Architecture for Deploying Pip." IEEE Network, May 1993.
Hinden, R.M. "IP Next Generation Overview." Communications of the ACM, June 1996.
Huitema, C. IPv6: The New Internet Protocol. Upper Saddle River (NJ): Prentice-Hall, 1996.
IETF IPng Working Group Home Page. URL: http://www.ietf.org/html.charters/ipngwg-charter.html. Last accessed: 29 January 1997.
IP Next Generation Web Page. URL: http://playground.sun.com/pub/ipng/html. Last accessed: 29 January 1997.
Katz, D. and P.S. Ford. "TUBA: Replacing IP with CLNP." IEEE Network, May 1993.
Kessler, G.C. An Overview of TCP/IP Protocols and the Internet. URL: http://www.sover.net/library/tcpip.html. Last accessed: 1 February 1997.
Miller, M. "Making the Move: The path for an orderly transition from IPv4 to IPv6." Network World, January 20, 1997.
Schneier, B. Applied Cryptography, 2/e. New York: John Wiley & Sons, 1996.
6bone Web Page. URL: http://www-6bone.lbl.gov/6bone. Last accessed: 29 January 1997.
Stallings, W. "IPv6: The New Internet Protocol." IEEE Communications Magazine, July 1996. (Also: URL: http://www.ieee.org/comsoc/stallings.html. Last accessed: December 10, 1996.)
Thomas, S. IPng and the TCP/IP Protocols. New York: John Wiley & Sons, 1996.
ABOUT THE AUTHOR: At the time of this paper's publication, Gary C. Kessler was Director of Information Technology at Hill Associates, a telecommunications training and education firm located in Colchester, Vermont. He is the co-author of ISDN, 4/e (McGraw-Hill), author of more than 50 articles, and frequent speaker at industry conferences. He can be reached via e-mail at email@example.com.
APPENDIX: THE IP VERSION 6 SPECIFICATIONS
IPv6 is specified in a number of RFCs. The core description of IPv6 and related protocols can be found in:
- RFC 1883: Internet Protocol, Version 6 (IPv6) Specification
- RFC 1884: IP Version 6 Addressing Architecture
- RFC 1885: Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6)
- RFC 1886: DNS Extensions to support IP version 6
Other related RFCs include:
- RFC 1550: IP: Next Generation (IPng) White Paper Solicitation
- RFC 1726: Technical Criteria for Choosing IP: The Next Generation (IPng)
- RFC 1752: The Recommendation for the IP Next Generation Protocol
- RFC 1825: Security Architecture for the Internet Protocol
- RFC 1826: IP Authentication Header
- RFC 1827: IP Encapsulating Security Protocol (ESP)
- RFC 1828: IP Authentication uysing Keyed MD5
- RFC 1829: The ESP DES-CBC Transform
- RFC 1881: IPv6 Address Allocation Management
- RFC 1887: An Architecture for IPv6 Unicast Address Allocation
- RFC 1888: OSI NSAPs and IPv6
- RFC 1897: IPv6 Testing Address Allocation
- RFC 1970: Neighbor Discovery for IP Version 6 (IPv6)
- RFC 1971: IPv6 Stateless Address Autoconfiguration
- RFC 1972: A Method for the Transmission of IPv6 Packets over Ethernet Networks
- RFC 1981: Path MTU Discovery for IP version 6
- RFC 2002: IP Mobility Support
- RFC 2003: IP Encapsulation within IP
- RFC 2019: Transmission of IPv6 Packets Over FDDI
- RFC 2023: IP Version 6 over PPP
- RFC 2073: IPv6 Provider-Based Unicast Address Format
- RFC 2080: RIPng for IPv6
- RFC 2081: RIPng Protocol Applicability Statement
RFCs may be obtained over the Internet via anonymous FTP from ftp://ds.internic.net/rfc. For additions sites and mechanisms to obtain RFCs, send e-mail to firstname.lastname@example.org and put help: ways_to_get_rfcs in the message body.
[Back to main text]
- Version 5 of IP is assigned to the experimental Internet Stream Protocol Version 2 (ST-2), described in RFC 1819. [Back to main text]
- This example, taken from the IPv6 specification, is counter-intuitive to many people. Low-fidelity audio, such as a telephone conversation, would probably be given a high priority because data loss is audible to the users as beeps and clicks. High-fidelity audio, on the other hand, such as CD-quality stereo, would probably be assigned a lower priority because these data streams usually contain redundant information and can, therefore, tolerate some information loss. [Back to main text]
- In this context, integrity refers to the assurance that the message has not been altered in transit, and that the contents of the received message are identical to the contents of the transmitted message. [Back to main text]
- In this context, authentication refers to the mechanism that one party uses to ensure that the other party really is who they claim to be. [Back to main text]
- In this context, confidentiality, or privacy, refers to keeping the contents of a message secret from a third-party. [Back to main text]
Do Big Packets work in IPv6? | Source CGP Grey, Wikimedia Commons
The design of IPv6 represented a relatively conservative evolutionary step of the Internet protocol. Mostly, it’s just IPv4 with significantly larger address fields. Mostly, but not completely, as there were some changes. IPv6 changed the boot process to use auto-configuration and multicast to perform functions that were performed by ARP and DHCP in IPv4. IPv6 added a 20-bit Flow Identifier to the packet header. IPv6 replaced IP header options with an optional chain of extension headers. IPv6 also changed the behaviour of packet fragmentation. This is what we will look at here.
We’ve looked at IP packet fragmentation earlier, so I won’t repeat myself here, but that article provides some useful background, so it may be useful to read before embarking on this.
IPv4 did not specify a minimum packet size that had to be passed un-fragmented through the network, although by implication it makes a lot of sense to ensure that the IP header itself is not fragmented. By implication, the minimum un-fragmented size is 40 bytes as presumably you would require any IP header options to remain with the common IPv4 header that is replicated in each fragment packet. The other limit defined by IPv4 was that a host had to be able to reassemble an IP packet that was up to at least 576 bytes in total.
The change implemented in IPv6 was that the IPv6 specification effectively cemented the IPv4 “DONT FRAGMENT” bit to ON. IPv6 packets could not be fragmented on the fly during transit across the network, so each router could either forward on an IPv6 packet or discard it. An early IPv6 specification, RFC 1883, published in 1995, required that IPv6 be able to pass any IPv6 packet across an IPv6 network that was up to 576 bytes in size without triggering packet fragmentation. This is a consistent change in the IPv4 behaviour, which effectively says that IP packets of up to 576 octets have a high probability of successful delivery, as they will not trigger any structural size limitations, and all hosts had to accept packets up to 576 bytes in size. By removing fragmentation on the fly within the network, the original IPv6 specification consistently translated this to a requirement to be able to pass a packet of up to 576 bytes in size across an IPv6 network without triggering fragmentation. This size was altered during the refinement of the protocol specification, and RFC 2460, published in 1998, raised this minimum size from 576 bytes to 1,280 bytes.
This raises two questions: Why set this “Don’t Fragment” bit to always ON? Why use 1,280 bytes as the critical minimum packet size?
Why Set “Don’t Fragment”?
This was a topic of considerable debate in the late 1980s in packet networking and surfaced once more in the design of IPv6 a few years later.
Fragmentation was seen as being highly inefficient. When a fragment is lost there is no ability to signal that just the missing fragment needs to be present. Instead, the sender is required to resend the entire packet once more. Fragmentation also represents a window of opportunity in terms of exploiting potential vulnerabilities. Trailing fragments have no upper layer transport protocol header, so firewalls have a problem in determining whether or not the fragment should be admitted. Packet reassembly consumes resources at the destination, so an attacker can generate packet fragments and force the intended victim to reserve for a period reassembly resources to await the remainder of the original packet’s fragments that will never arrive.
This and more was written up in a 1987 paper, “Fragmentation Considered Harmful” by Kent and Mogul, and the conclusions from the paper are to avoid fragmentation wherever possible.
This paper recommended the use of the “Don’t Fragment” flag as a means of supporting the communications hosts discovering the path MTU. The intended functionality was that a router that could not forward a packet as it was too large for the next hop link would return the leading bytes of the packet together with the value of the MTU of the next hop link back to the packet’s sender. The sender could assemble this information and discern the path MTU for each of the destinations that the local sender communicates with.
The online digital archives of the discussions at the time appear to be relatively incomplete, and I cannot find anything in the way of a design discussion about the selection of this value for IPv6. One view is that if not 576 then surely the next logical choice would be 1,500. The rationale for 1,500 was the predominance of Ethernet as an almost ubiquitous layer 2 media framework for digital transmission systems, and even if they were not Ethernet networks, Ethernet packet framing was extremely common.
It’s possible that considerations of encapsulation and the related area of tunnelling come into play here. The use of “shim” packet headers for PPP and MPLS would tend to imply that, as a universal constant 1,492 was probably a safe choice, allowing 8 bytes for local shim headers. However, if you admit this form of packet encapsulation, then why not allow for IPv6-in-IPv4 (20 bytes), or even IPv6-in-IPv6 (40 bytes)? If you allow for this possibility, then perhaps it might also make sense to also add a further 8 bytes for a UDP header, or 20 bytes to permit a TCP header. Even in this case of encapsulating IPv6-in-TCP-in-IPv6, the resultant payload packet size is a maximum of 1,440 octets.
It appears that the minimum unfragmented size of 1,280 bytes as specified in RFC 2460 appears to be the result of considering the extremely unlikely case of a 1,500 bytes packet that carries 220 octets of encapsulation headers. Our intuition tends to the view that the Internet supports a 1,500 byte packet almost ubiquitously, and most forms of tunnelling would be happily accommodated by allowing for much less than 220 bytes of tunnel encapsulation headers per packet.
So why was the value of 1,280 chosen? I’m afraid that I simply don’t know!
IPv6 Fragmentation Behaviour
How do we cope with variable packet size limits in the IPv6 network? When a router is passed a packet that is too large to be forwarded to the next hop, then the router is supposed to extract the source IPv6 address of the packet and generate an ICMP message addressed to this source address. The ICMP message includes the MTU value of the next hop and includes as a payload the leading bytes of the original packet.
What should an IPv6 host do when receiving this ICMP message?
RFC 1981 proposed that hosts reduced the size of packets: “The node MUST reduce the size of the packets it is sending along the path.”
In the context of a TCP session, this can be achieved through an interaction between the upper-level protocol and the ICMP receiver on the sending host. The ICMP Packet Too Big message should cause the TCP session to drop its estimate of the maximum un-fragmented packet size that is supported on the path between this host and the remote end (the “PathMTU”) to the value provided in the ICMP message. This way the TCP session should not generate fragmented packets, but dynamically adjust its packetization size to match what is supported on the network path. In other words, for TCP sessions, the preferred behaviour is not to use IP fragmentation as a response, but instead to push this information into the TCP session and adjust the TCP Maximum Segment Size appropriately. When this occurs as per the plan, one should not see fragmented TCP packets at all.
If we won’t (or shouldn’t) see TCP fragments in IPv6, should we expect to see UDP fragments? As with many questions about UDP behaviour, the answer is that “it depends!” What it depends on is the behaviour of the application that is emitting the UDP packets.
Assuming that the UDP application is a stateless application that operates in a simply query/response model, such as a DNS server for example, then the application has no remembered connection state and no buffer of previously sent messages. This implies that the ICMP message is largely useless in any case!
So what value should an IPv6 host use for its local MTU?
If you set it low, such as a setting of 1,280 bytes, then attempts to send larger payloads in UDP will result in fragmentation of the payload.
If you set it higher, and 1,500 bytes is also common, then attempts to send a large payload, such as 1,400 bytes may encounter path constraints and may generate ICMP Packet Too Big ICMP messages. Assuming that the ICMP message makes it all the way back to the sender (which is by no means an assured outcome) then a TCP application can react by adjusting its session MSS. On the other hand, a UDP-based application may be simply be unable to react.
This would tend to suggest that a conservative approach is to use 1,280 bytes as a local MTU, as this would minimize, or hopefully eliminate the issues of UDP and ICMP PTB messages. However, relying on packet fragmentation, which is the inevitable consequence of using a small MTU with larger UDP payloads, may not necessarily be a good idea either. A soon-to-be Informational RFC, currently called draft-ietf-v6ops-ipv6-ehs-in-real-world by Fernando Gont and colleagues, indicated a packet drop rate of up to 55% when passing IPv6 packets with Fragmentation Extension headers through the network.
Experimenting in the DNS
Given that this issue of packet fragmentation is one that primarily concerns UDP, and the major user of UDP on the Internet today appears to be the DNS, we set up an experiment that tested the ability to pass variously sized IPv6 UDP DNS responses through the network.
We tested in three size ranges for packets: small (below 1,280 bytes); medium (between 1,280 and 1,500 bytes) and large (above 1,500 bytes). The first size is the control point, and we do not expect packets of this size to encounter any network delivery issues. The second size lies between 1280 and 1,500 bytes, and we expect to see some form of interaction between packets in this size range and network paths that generate ICMP PTB messages that have MTUs below the particular packet size. The third size is 1700 octets. The sending hosts use a local outbound MTU setting of 1,500, so the outbound packet is fragmented at the source into an initial segment and a trailing fragment. Both of these packets have a fragmentation header.
We used a modified DNS name server that ignored EDNS0 UDP buffer sizes, and did not respond to TCP connection requests. The system was configured with a local MTU of 1,500 bytes, and in this case listened exclusively on IPv6. The Linux kernel (running Debian GNU/Linux 8 (Jesse)) we used includes support for ICMPv6 Packet Too Big Messages, and will hold in its FIB a cache of the recent Path MTU values. In the case of outgoing UDP messages, the kernel will perform fragmentation down to 1,280 bytes.
We expected to see a result where all IPv6 resolvers were capable of receiving packets up to 1,280 bytes in length. It’s unclear how many resolvers sit behind network paths that do not admit 1,500-byte packets, so we are unsure what the drop rate might be for packets that are sized between 1,280 and 1,500 bytes. The interaction is a little more complex here, as while the local server will adjust its local cache of Path MTU in response to received ICMP messages, this will only be effective if the remote end uses the same interface identifier value subsequent query, and if the remote end performs a re-query within the local cache lifetime.
For packets larger than 1,500 octets, the response will be fragmented at the source, and there are three potential reasons why the network will discard the packet. The initial fragment will be 1,500 octets in length, which implies that this leading fragment will encounter the same path MTU issues as the slightly smaller packets. Secondly, firewalls may reject trailing fragments as there is no transport level port information in the trailing fragment. And finally, there is the Extension Header drop issue where other observations report a drop rate of approximately 50% when an IPv6 packet has a fragmentation Extension Header.
We would expect to see that if a resolver is able to follow the glueless delegation path for small packets then we would expect to see some level of packet drop for packets larger than 1280 bytes due to Path MTU mismatch issues where the remote end does not perform re-queries within the cache lifetime. For the packet test larger than 1,500 bytes this would be compounded by a further level of packet drop due to firewall filtering of trailing fragments, and in addition, there is the extension header packet drop problem as noted above.
When a resolver attempts to resolve a unique terminal name, the “parent” will send a response to indicate that the terminal name lies in a uniquely named delegated zone, and provides the DNS name of the authoritative name server for this delegated zone, but it deliberately omits the conventional inclusion of the IP address values of this name server (the “glue” records that are conventionally loaded into the Additional section of a DNS response). This means that the resolver will have to separately resolve this DNS name to an IP address before it can resume its task to resolve the original name. We have made this name a synthetic unique name so that the resolver cannot use a cached value, and must perform the name resolution task each time.
In our case, we have deliberately inflated the DNS response to the query for the address record of the authoritative name server, and the visible confirmation that the DNS resolver has successfully received this inflated response is in the subsequent query to the delegated domain for the original name.
What we observed from this experiment is shown in Table 1.
|Size||Count||Always Fetched||Sometime Fetched||Never Fetched|
|151||5,707||5,603 (98%)||104 (2%)||0|
|1,159||489||487 (99%)||2 (1%)||0|
|1,400||5,075||4,942 (97%)||95 (2%)||38 (1%)|
|1,425||480||467 (97%)||3 (1%)||10 (2%)|
|1,455||481||456 (94%)||3 (1%)||22 (5%)|
|1,700||5,086||3,857 (76%)||65 (1%)||1,164 (23%)|
Table 1 – UDP Fetch Rate vs Packet Size
Table 1 shows that the packet loss rates increase as the packet size increases. Of course, the reasons for the packet loss rates differ above and below the 1,500-byte size point.
Below 1,500 octets we can expect to see Path MTU issues as being the major loss factor for packets, where the packet is larger than a particular link on the path to the destination. This is commonly encountered in IP-in-IP tunnels, although in this case the common IPv6-in-IPv4 case would not impact these three packet sizes, as they all fit below the 1,480 IPv6-in-IPv6 tunnel MTU. When an outbound packet encounters such a link, the router should return a Packet Too Big ICMPv6 message. The distribution of sizes reported in these ICMP messages is shown in Table 2.
Table 2 – Distribution of MTU sizes in Packet Too Big messages per /64 prefix
While the most common constrained MTU value is 1,480 bytes, corresponding to a simple IPv6-in-IPv4 tunnel arrangement, there is also a range of smaller MTU values spread across 1,400 to 1,480 bytes. Some appear to be related to IPv6-in-IPv6 tunnels (1460 bytes), or the use of UDP in IPv6 and IPv6 as the outer tunnel wrapper (1,472 and 1,452 bytes). Other values are not as readily explained, and the peak at 1,400 bytes appears to be as contrived as 1,280 bytes! The 1,500 byte MTU is likely to be a known problem with a particular vendor’s firewall implementation, which reassembles fragments to determine whether to admit the packet, but then does not perform any subsequent fragmentation, so the reassembled packet is too large for the next hop! The 1,586 MTU is also likely to be a similar form of implementation bug. Approximately 90% of the observed IPv6 /64 prefixes appear to be reachable using a 1,500 byte Path MTU.
Above 1,500 bytes in size there are three potential problems that a UDP packet may encounter, as noted above, namely the initial 1,500 byte fragment may trigger MTU mismatch problems, the trailing fragment may encounter filtering firewall rules that will cause the destination to timeout on packet reassembly, or both the initial and trailing fragments will be dropped in transit because of the presence of the Fragmentation Extension Header. The first two cases are meant to generate ICMPv6 messages back to the sender, while the Extension Header discard is a silent discard.
We saw consistent problems in sending the 1,700 byte UDP packet to 1,164 distinct /64 IPv6 subnets. Some 331 of these failing subnets generated ICMPv6 messages indicating a timeout in fragment reassembly, and the payload indicated that the receiver had received the initial fragment and was waiting for the trailing fragment. This is consistent with a front-end firewall that filters trailing fragments. A further 61 failing subnets generated ICMP Packet Too Big messages, and evidently did not perform re-queries from the same /128 IPv6 address within the local cache lifetime window. It is unclear what is going on in the remaining 772 cases. It may be that there are additional fragmentation reassembly or path MTU issues, but that the diagnostic ICMPv6 messages are either not being generated, or are being filtered in the network themselves. It is also possible that we may be encountering the same behaviour reported by Gont et al. where parts of the IPv6 network appear to silently drop IPv6 packets that use an Extension Header, including the Fragmentation Extension Header. If this is the case, then from the limited set of IPv6 addresses that are used for DNS resolvers, the extent of this Extended Header packet drop is somewhere around 15%, a considerably lower value that the 50% level reported by Gont et al.
Do Big Packets work in IPv6?
The overall picture for IPv6, UDP and large packets does not appear, in the first instance, to be all that reassuring.
The 5% drop rate for 1,455 byte IPv6 packets is somewhat higher than one would expect from a server that appropriately responds to ICMPv6 Packet Too Big messages. To what extent this is due to the ICMP messages not being generated, or being filtered in flight, and to what extent this is due to the application not performing re-queries is an open question, but there is certainly a problem here.
The larger issue is a 23% drop rate for fragmented IPv6 packets. Because the network cannot forward-fragment the initial fragment, the initial fragment may encounter Path MTU issues and we can anticipate a certain level of packet drop from this. Secondly, the presence of filtering firewalls that discard tailing fragments is a problem here (as it is in IPv4). And there is the additional issue in IPv6 that there are reports of routers that perform a silent discard of IPv6 packets that contain an Extension Header, including the Fragmentation Extension Header. The combination of these three factors points to a consistent loss rate of one in four packets in the IPv6 network. So, no, big packets don’t work all that well in IPv6.
But is this really as bad as it sounds?
If the aim of the exercise is for an application to get the date from the server to the client, then it’s not just the IP level behaviour that we must take into account, but also the application level behaviour.
In this experiment we deliberately turned off the DNS level EDNS0 UDP buffer size behaviour in the server. We sent the same large response to all clients irrespective of the EDNS0 settings they tried to pass to our server in their query. Normally, and quite correctly, this would not happen. If the client resolver did not include an EDNS0 buffer size we should’ve constrained the response to no more than 512 bytes, and in the case of these larger responses we should’ve sent a truncated DNS response. This would trigger the DNS resolver to re-attempt the query using TCP. If the client resolver provided an EDNS0 UDP buffer size we should’ve honoured that maximum size, and truncated the response if it could not fit into the specified buffer size. Again, truncation would trigger the client resolver to reattempt the query using TCP.
The DNS client resolver also has a simple test to distinguish between path MTU issues and unresponsive servers. If it does not receive a response to its original query, it will repeat the query using an EDNS0 UDP buffer size of 512 bytes. If this query does not elicit a response then it can conclude that it is an unresponsive server, as a 512 byte response should not encounter Path MTU problems.
By ignoring EDNS0 and not truncating any responses in this experiment we deliberately set out to expose the underlying IP level failure rate and did not allow the DNS to use its conventional responses that would’ve mitigated this issue to a significant extent.
The observation is that the DNS already uses large packets, and we have no solid evidence that supports large-scale structural damage when large DNS responses are used. For example, a DNSSEC-validating resolver will at some point need to validate a name within the .org domain, and to do so it will need to query for the DNSKEY RRset of .org. This domain is served by dual stack name servers, and they will serve a 1,625 byte response to this query. In the case of an IPv4 UDP query the response is a 1,500 initial fragment and a 173 byte trailing fragment. In the case of IPv6 the servers provide a 1,280 byte initial fragment and a 449 byte trailing fragment. As far as I am aware, there are no complaints of wide-scale problems in resolving .org domain names by DNSSEC validating resolvers.
While there is an underlying issue with the viability of large IP packets and fragmentation in the IPv6 network, as long as the application is capable of resolving the distinction between network packet loss and server availability, then this need not be a significant impediment.
MTU choice: 1,280 vs 1,500?
The second question is whether it is preferable to use a 1,280 byte local MTU for IPv6, or use a 1,500 byte MTU (assuming that you are connecting to an underlying local network that supports a 1,500 byte MTU, of course).
A lower local MTU setting will avoid Path MTU issues, which will be particularly useful for TCP, and this will also ensure that TCP packets are not fragmented MTU issues. As long as all ICMPv6 messages can reach the sender, and TCP can adjust its session MSS accordingly, then TCP need not be adversely impacted by a higher MTU. But practical experience suggests that ICMP filtering is widespread and larger MTUs in TCP expose the risk of wedging the TCP session in an irretrievable manner. So a smaller MTU for TCP in IPv6 will avoid the relatively dramatic loss rates associated with IPv6 and fragmentation, while still allowing TCP to perform path MTU discovery if it so desires. So for TCP and IPv6 it makes sense to start the max segment size low and potentially probe upward if desired.
But this lower MTU setting, if applied to UDP over IPv6, will cause fragmented UDP packets to start at the lower threshold size, and the not inconsiderable fragment filtering and Fragmentation Extension Header loss factors then come into play. This, in turn, may require an increased level of application level intervention and in the case of DNS may involve higher query levels and a higher TCP query rate as a consequence. For UDP the larger MTU setting of 1,500 bytes allows packets in the range of 1,280 to 1,500 bytes to be sent without gratuitous fragmentation. As we saw in this experiment there will still be remote systems using smaller UDP MTU sizes, and the flow of ICMP Packet Too Big messages will still trigger packet fragmentation, but rather than occurring for all packets over the lower threshold, this fragmentation will occur only if the packet is larger than the local interface MTU or in response to an incoming ICMP message.
So for TCP over IPv6 an MTU choice of 1,280, or an initial MSS default value of 1,220, is a very rational design choice. For UDP over IPv6 it’s not so straightforward, but it appears that there are marginal advantages with using the larger MTU size, as it avoids gratuitous fragmentation and the associated fragmentation loss issues.
If you had to pick one, and only one, setting then I would tend towards optimizing the system for TCP over UDP.
Can we set different MTU values for TCP and UDP? This does not appear to be a common feature of most operating systems, however Apple’s MacOSX platform, and the related FreeBSD operating system both have a very interesting system configuration control (sysctl) parameter of net.inet.tcp.v6mssdflt, which appears to be a default local setting for the TCP MSS size for IPv6. In theory this setting could be used to implement these differing TCP and UDP MTU settings for IPv6. But perhaps a story about my experiences in playing with these system settings is best left to another time.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.