By: Lixia Zhang
Date: September 7, 2006
This draft has three objectives:
- to discuss the impact of multihoming on the scalability of the global routing system;
- to provide an overview of GSE, one of the early proposals by Mike O’Dell to address the multihoming scalability problem;
- to identify open issues raised by the GSE proposal, which may serve as a first step toward resolving them.
In its original design IPv4 had a class-based address structure that divided the 2^32 address space into 2^7 large networks (Class-A), 2^14 medium size networks (Class-B), and 2^21 small networks (Class-C)3.
Each network is represented by a Network ID, also called a network prefix, with the length of 8 bits, 16 bits, and 24 bits for Class A, B, C networks, respectively. Global routing was performed by matching the high order bits of the packet destination address against a table indexed by network prefixes. Each prefix took one entry in the global routing table and the length of the prefix was implied by the address class.
The explosive growth of the Internet during early 1990′s brought serious scalability problems to the Internet routing infrastructure: there were too few Class-A address blocks to give out; Class-B blocks were nearly exhausted; and as a result a large number of Class-C blocks were assigned. Because each Class-C network has only 256 addresses, one institution might have to get multiple Class-C address blocks. Since each network ID takes one entry in the global routing table, the table started growing at an alarming rate, until Classless Interdomain Routing (CIDR) was deployed [RFC4632].
At the time it was deployed, CIDR provided an effective way to slow down the growth of the routing table in the Internet backbone, commonly referred to as the Default Free Zone (DFZ). 15 years after CIDR’s deployment, however, today’s global routing system is facing serious scaling problems again. A rough estimate from the weekly CIDR report  shows that the IPv4 DFZ routing table size has gone up by about 36% since September 2004 and doubled since January 2001. The rate of growth also seems to be accelerating over time and, if the current acceleration rate is maintained, the DFZ routing table size would double again in about early 20104. What is the main cause of the rapid routing table growth this time? The problem appears to be customer multihoming and traffic engineering.
The multihoming induced routing scalability problem has long been recognized, and a number of recent IETF efforts have been dedicated to the development of solutions to the problem [2,3]. This draft is intended to help the reader fully understand the importance of the problem, and to describe some alternative solutions in the design space. We first describe the relation between edge multihoming and traffic engineering practice and DFZ routing scalability. We then describe an early proposal, GSE by O’Dell from 1997, and show how it works to resolve the multihoming scalability problem. We also identify some of the open issues that must be resolved before GSE, or similar proposals, can be deployed in practice.
2. Impact of Multihoming on Routing Scalability
The basic idea behind CIDR is simple: the size of an IP address block is allowed to be 2^n, where 0 <= n <= 32. This simple idea helped slow down routing table growth in two ways. First, each organization needs only one address block of the right size, as opposed to multiple Class-C blocks in pre-CIDR days. Second, and perhaps more important, CIDR allows an Internet Service Provider (ISP) to divide an allocated address block into multiple pieces of potentially different sizes, and to assign each piece to a customer according to its need. Each IPv4 address block allocated to an ISP typically has an address prefix 8-21 bits long. The address block allocated to a customer is represented by a prefix longer than the ISP’s prefix, with the high order bits being the same as the ISP’s prefix. The ISP can announce the prefix of its allocated address block to the global routing system and receive data traffic destined to all of its customers, as long as none of the longer prefixes assigned to individual customers are announced separately. The ISP then distributes the traffic to its customers according to their individual address prefixes. Thus CIDR enables an ISP to support many customers while still announcing only one aggregated prefix to the global Internet. In an ideal CIDR case, the number of routing table entries should be around the same order of magnitude as the number of ISPs. However, in reality, the former has always been much larger than the latter, since each ISP tends to have multiple allocated address blocks, and more important, there exist a large number of provider-independent (PI) prefixes; many of these are legacy allocations that predate the introduction of CIDR.
PI prefixes are the address blocks allocated to customer networks directly. The important property of a PI prefix is that its owner has the freedom to switch providers without renumbering the network. Furthermore, a network with a PI prefix can connect to multiple ISPs simultaneously. This is known as multihoming, which allows the network to stay reachable through whichever providers remain functional when some part of the Internet fails. As Renesys’ measurement of the 2003 US East Coast blackout shows, well engineered multihoming can be an effective way to ensure Internet connectivity . In the absence of network failures, a multihomed site can distribute outbound traffic across multiple provider connections to maximize some locally defined goals such as cost, throughput, and/or performance. If routing policy permits, a customer may also subdivide its address allocation, that is, split its prefix into multiple longer ones that are then used for load-balancing the incoming traffic, as shown in Figure-1 below:
The aforementioned advantages of multihomed sites, however, come at the cost of one or possibly multiple entries per site in the global routing table. During the early days of CIDR deployment, the number of customer networks was relatively small, few were multihomed, and most of them got address assignments from their ISPs. Thus CIDR aggregation worked out well. However over time more and more customer networks became multihomed for improved Internet availability and performance. Our recent measurement results indicate that today the majority of customer networks are multihomed .
Such pervasive multihoming practice has made a profound impact on the scalability of the current routing and address architecture. Being reachable through any of its providers implies that a customer network must be visible in the global routing table, that is, it must announce a PI prefix, or otherwise make its providers announce a specific prefix for it5. Moreover, if a site wants to load-balance incoming traffic, it may also split its prefix into multiple longer ones and announce them to different ISPs. Consequently, both of CIDR’s advantages mentioned earlier, one address block per customer site and ISP aggregation of customer prefixes, are lost through current multihoming and traffic engineering practices.
A number of people foresaw the routing scalability problem resulting from multihoming and proposed solutions. Below we describe GSE, one of the earliest proposed solutions suggested by Mike O’Dell in 1997.
3. GSE: An Alternate Addressing Architecture for IPv6: How It Works
The proposed IPv6 address structure inherits from IPv4 the CIDR-style “Provider-based Addressing”. Recognizing CIDR’s intrinsic limitation in the presence of multi-homed sites, O’Dell proposed to divide IPv6′s 16-byte address into three parts, with the lower N bytes being the End System Designator (ESD), the middle M bytes representing site topology partition (STP) for local routing, and the top (16-M-N) bytes being Routing Goop6, or RG, to be used for routing between providers. A Routing Goop signifies where a site attaches to the Global Internet, and a multihomed site will have multiple RGs, one for each of its providers. As the site changes providers, its RGs change but not the remainder of the address structure. When a packet flow moves from one provider connection to another, the RGs in the packets’ addresses change as well. Therefore GSE requires that transport and all of the higher level protocols use the ESD portion, instead of the whole IPv6 address as connection identifiers.
The fundamental novelty in the GSE design is to hide a site’s RG from its internal hosts and routers, so that they are insulated from the external topological connectivity and such changes as multihoming or re-homing (that is, changing providers). This insulation is implemented through the following steps as shown in Figure-2. (1) When generating a packet, the source host fills the destination address with a complete 16-byte IPv6 destination address, including the RG, that it receives from DNS resolution, and fills the upper (16-M-N) bytes in the source address with a special “Site-Local” prefix7. (2) If the destination is not within the local site, the packet will leave the site via one of possibly several site boundary routers, which will insert a proper RG, expected to be used for returning packets of the same end-to-end communication, into the packet’s source address. (3) When the packet reaches a site boundary router of the destination network, the router will replace the RG in the destination address with the Site-Local prefix. As a result, the internal routers and hosts of a site should never see the value of its own RG.
This insulation provides a site with the flexibility of re-homing and multihoming. Because a site’s interior should have no knowledge about the RGs, the site administrator can change providers, and hence change the RGs, whenever needed. At the same time ISPs can also aggressively aggregate RGs as needed for routing scalability.
However, every coin has two sides. Along with its gains GSE also raised a set of new issues that must be fully understood and resolved before it can be put into deployment. In the next section we briefly describe a few of the major ones that have been identified.
Before leaving this section we would like to point out that GSE was not the only proposal in the direction of insulating edge networks from transit providers. In RFC1955 Bob Hinden proposed an ENCAPS scheme that separates providers and customers into two address spaces and uses tunnels to carry packets from source customer networks over the provider space to reach destination customer networks . Here the tunneling plays a role similar that of the RG in the GSE design, hiding the provider space from edge networks.
4. Open Issues in GSE
Before diving into specific open issues in GSE, we would like to stress that the list of issues mentioned in this section is not complete and does not necessarily capture all the major ones. Rather we hope that the list can serve as a starting point for future discussions; some of these issues were also mentioned in the GSE proposal . [RFC4218 and RFC4219] provide good sources of information for general threats and considerations in the development of multihoming solutions.
4.1 RGs and DNS Servers
Since hosts learn about destination RGs from DNS lookup, naturally DNS plays a critical role in GSE. One new issue raised by GSE is which RGs to use to reach DNS servers. Even if one may assume that DNS root servers will use host routes that stay relatively stable, other DNS servers may be reachable by using one of multiple RGs. When the hosting sites change providers, the RGs used for reaching the DNS servers also change. Assuming the network hosting one of the example.com DNS servers is multihomed, which one RG or how many RGs should be returned from a DNS server lookup for example.com?
Although GSE strives to insulate a site’s internal hosts and routers from RG changes, DNS servers are exceptions. The authoritative DNS servers of a customer site must know the RGs of the site in order to resolve the DNS names for the site, and thus they must be updated with all of the RG changes. Furthermore, whenever a site changes its RGs, all of the DNS servers in the site, both its own and others that it hosts, change their IP addresses. Hence, all of the parents of all of those servers, as well as their owners, must be properly updated.
In addition, GSE also brings up the need for supporting 2-faced DNS. That is, a DNS server must be able to tell whether a query is from a local or remote host, so that it can decide whether to put Site-Local or the site’s RG(s) in the returned address. For hosts in a multihomed site, the DNS server must also decide which of the multiple RGs to put in the addresses in DNS replies. As we will mention later, one organization may have multiple sites that are interconnected through both a private internal network and the external transit core, thereby adding additional complexity to 2-faced DNS servers.
4.2 The Border Links
As one can see from Figure-2, although GSE insulates edge networks from the transit core, there exist physical links that connect the former to the latter. Let us call them border links. On one hand, when packets exit the source site, it is possible to make the source site be aware of the status of its border links and associated routers, so that outbound packets can choose exit routers to avoid any failed border link or router. On the other hand, which border link at the destination end a packet may travel through is determined by the RG in the packet’s destination address. In picking a destination RG, the source site has no easy way to tell whether any of the remote edge links may have failed in order to avoid it8. The GSE proposal suggested manually configuring all of the routers serving the same site to be aware of each other as a group; in case one of the routers loses its connectivity to the site, it can tunnel traffic to the others in the group. Such configuration not only requires close coordination between competing providers, but also must be done for tens of thousands of multihomed edge sites, which posts a big question mark on the feasibility of this proposed solution.
We would like to point out that this issue of handling border link failures is not unique to GSE; the ENCAPS proposal shares a similar problem. In fact any approach in the direction of separating edges from the transit core will find that some special handling is needed to deal with border link failures. Those links along the isolation boundary provide connectivity between the transit core and edge networks. However, they are not covered by routing protocol of the transit core because the edge networks at the other end of those links are now isolated from the core and are no longer routable entities.
4.3 RGs and Tunnels
IP tunneling has been widely used as an simple way to meet various special packet delivery needs. Generally speaking, an IP tunnel can be set up between any two nodes in the same address space. However the GSE design raises new issues in tunneling due to its separation of RGs and the rest of the IP address. In GSE, it is unclear whether tunneling would still be allowed between any two IP boxes, or have to be constrained to being between site border routers only. For an IP tunnel across RG boundaries, there are also questions regarding which source and destination RGs should be given to the packets going into the tunnel, and how to handle the packets when they get out of the tunnel and land on a different site.
In light of the extensive use of Virtual Private Networks (VPNs) that has grown up since GSE was proposed, and the use of tunnels at protocol layers below IP, tunneling operations need a thorough examination in the GSE context.
4.4 Traffic Engineering
When an edge network is multihomed, generally speaking it would like to be able to choose exit routers for outbound traffic (outbound traffic engineering) and entry routers for incoming traffic (inbound traffic engineering). In addition, a transit network may also wish to know how many different paths it has in order to reach a given destination network, so that it can send packets in certain proportion along parallel paths based on some locally defined criteria (transit traffic engineering).
GSE was proposed as a scalable way to support site multihoming, but it did not directly address the need for traffic engineering. In particular, the GSE draft mentioned only packets reaching a desired source site exit router, without elaborating on exactly how to direct outbound traffic toward potentially multiple exits. Similarly, the destination RGs are included in the DNS replies, but it is left open as to whether the DNS server, or the sending host, should decide which RG to use among multiple options for inbound traffic engineering. Transit traffic engineering is even more challenging, as a transit network would have no easy way to tell whether packets carrying different destination RGs belong to the same destination site. In short, although it may be possible to enhance GSE for achieving traffic engineering goals, the existing GSE proposal clearly does not solve this problem.
4.5 Other GSE Related Issues
GSE opened a door to decouple edge sites’ internal addressing from its connection to the transit core, yet how to take this opportunity to build scalable and robust transit routing operations remains an open issue. In  O’Dell sketched out an idea of partitioning the global Internet into a set of tree-shaped regions anchored by “Large Structures (LS)”. Flat-routing is carried out between LS’s and within the regions under each LS. Any two LS’s may share a tangency below the top level for “cut-through” paths, but such cut-through paths were considered controlled circumvention of otherwise hierarchical paths. Measurement results suggest that, over the past 10 years, the global topology has become more densely connected, and interconnection below the top level has become the norm rather than controlled circumventions, suggesting that the originally proposed RG structure and usage may need to be re-evaluated.
Another issue involves routing within large organizations that may have a presence in multiple locations, as well as routing packets between multiple sites of the same organization through the transit core. Each of the sites may be connected through a private internal network, as well as having its own RGs for the connections to the transit core which may also change from time to time. In a GSE setting, how to best utilize both internal and external connectivity for packet delivery between sites seems an entirely open question at this time.
Yet another important issue in GSE deployment concerns the management of End System Designator (ESD) space in order to assure ESD’s global uniqueness, as ESDs would be used for end-to-end connection identifications. One must also be prepared to handle ESD collisions in case they occur.
5. A Few Ending Words
It has been nearly 10 years since the GSE proposal was published, yet the problem GSE was set forth to solve is still with us today, and can potentially get much worse when IPv6 starts seeing wide deployment. Despite the IETF’s effort in developing multihoming support with provider-allocated addresses [2, 3], regional Internet registries have been under heavy pressure from customers to allocate Provider-Independent IPv6 address blocks, a worrisome sign for IPv6′s future routing scalability.
GSE pointed out a brand-new approach to the multihoming support problem. However because it is drastically different from existing practice, at the time it was proposed, a large number of concerns were raised (some of which were captured in [8, 9]), and the original proposal was never fully explored to appreciate its advantages, to understand its tradeoffs, and to identify its open issues. In our search for a scalable global routing system design, it seems worthwhile to pay a full revisit to the GSE proposal.
First of all, I would like to thank Mirjam Kühne. This article would not have been written without her encouragement and patience. I sincerely thank David Meyer, Brian Carpenter, David Thaler, and other IAB members for their comments. Special thanks go to Elwyn Davies who painstakingly went through an earlier draft and made numerous corrections.
|||IETF Site Multihoming in IPv6 Working Group, www.ietf.org/html.charters/multi6-charter.html|
|||IETF Site Multihoming by IPv6 Intermediation Working Group, www.ietf.org/html.charters|
|||“Impact of the 2003 Blackouts on Internet Communications”, Renesys Corporation,www.renesys.com/tech/presentations/blackout_results, November 2003.|
|||“Observing the Evolution of Internet AS Topology”, R. Oliveira et. al., submitted for publication, August, 2006.|
|||“GSE – An Alternate Addressing Architecture for IPv6”, Mike O’Dell,www.watersprings.org/pub/id/draft-ietf-ipngwg-gseaddr-00.txt February 1997.|
|||“New Scheme for Internet Routing and Addressing (ENCAPS) for IPNG”, R. Hinden,www.ietf.org/rfc/rfc1955.txt, June 1996.|
|||Minutes from the two day IPng interim meeting February 27-28, 1997,http://playground.sun.com/pub/ipng/html/minutes/ipng-minutes-feb97.txt.|
|||“Separating Identifiers and Locators in Addresses: An Analysis of the GSE Proposal for IPv6”, M. Crawford et. al., http://ietfreport.isoc.org/idref/draft-ietf-ipngwg-esd-analysis/ October 1999.
[RFC4632] “Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan”, V. Fuller and T. Li, www.ietf.org/rfc/rfc4632.txt, August 2006.