Routing

Information Routing on Content Locators

Editor’s note: This article takes a deep dive into some of the new technology developments that were demonstrated during
the IETF 90 Bits-n-Bites event in Toronto. It also offers insight into one aspect of the exciting and ground-breaking work
encompassed in the Internet Research Task Force’s Information Centric Networking Research Group. See https://irtf.org/icnrg for
more on this topic.

By: Dimitri Papadimitriou, Sahel Sahhaf, Wouter Tavernier

Date: November 1, 2014

line break image

Context

The increase in Internet traffic volume for applications, such as mobile video and cloud computing, has led to various technologies enabling content distribution that rely on caching and replication. As they are proprietary and deployed in silos, these content-distribution technologies do not allow unique and securely identified named information independent of the distribution channel. Moreover, these technologies are usually implemented as an overlay, which leads to needless inefficiency. Two seed models—the overlay model and the name-based routing model—can be identified as the root of most currently proposed approaches to information-centric networking (ICN). The latter aims at enabling data to become independent from their network/physical location, application, and storage support, as well as their transport/distribution channel in support of both user and content mobility.

The Limits of Name-based Routing

The name of a resource indicates what we seek, an address (locator) indicates where it is. Following this distinction, the two major alternatives currently under investigation consider either routing on content names (with IP addresses keeping the network locator semantic) or introducing another level of indirection (overlay) by extending the IP address semantic to include a location name from where the information can be retrieved.

The first alternative suffers from limited scaling (in terms of memory space) as approaches such as Content-Centric Networking (CCN)[1] are confronted with name spaces that have not been designed to sustain forwarding performance and scaling. Indeed, the routing information corresponding to each content object has to be maintained in the routing table, the number of content objects is very large (between 1015 and 1022[2]), and the size of the routing tables becomes a key concern. Assume for instance that routing tables would include one entry per top-level domain, a name-based routing table would have to hold 2×108 routes (compared to the 5×105 active BGP routes as of mid-2014). Moreover, the resulting size increase of the routing tables and associated processing would worsen over time as the number of domains increases by 10–15% per year (as indicated in the Verisign report of April 2013). Finally, the adoption of a new content name space is challenging, but the maintenance of a hierarchical tree-based structure (summarization) is even more difficult. The second alternative raises the same kinds of issues as any network overlay following the well-known aphorism of D. Wheeler that describes the problem of too many levels of indirection: “All problems in computer science can be solved by adding another level of indirection … but that usually will create another problem.”

If names become the fundamental structure of the information exchange process, name-based routing becomes a scaling and performance bottleneck. So, unless names get overloaded with locator semantics, there is no means by which the resolution of a name into an address can be avoided. The implication being that conventional approaches to ICN like CCN or its multiple variants (see “Networking named content”[3] and “Named Data Networking”[4]) shift this operation to a network-layer functionality. On the other hand, this concept exacerbates the memory-scaling limits of stretch-1 shortest path routing already observed in many experimental studies and theoretic investigations.

Content Locators

With both alternatives, the fundamental problem is as follows: localization and routing refer to distinct functions associated to distinct objects (address vs. route), which cannot be derived from each other using local knowledge. Moreover, the higher the level of information on which the routing decision is performed, the higher the memory cost. On the other hand, lowering the level by providing additional resolution processes increases the communication cost and latency. This observation leads to the consideration that (1) content-based forwarding requires one to keep locators (instead of names) as the information unit for routing/forwarding decisions and, (2) the locator space, say X, should also elevate the memory-scaling problem induced by stretch-1 shortest path routing. To this end, the localization function performing at x ∈ X shall, after resolving the content name into the corresponding locator, say y ∈ X, compute the distance d(x,y) and the routing function (distributed by nature) shall determine locally and independently for any destination the adjacent node along a loop-free path, such that incoming messages directed to destination y can reach it. Hence, we seek a locator space X that can be processed at end-points by the localization function and at intermediate nodes by the routing function.

It remains to be seen which locator value space would best fit this combined role. An obvious choice would be a so-called topology-dependent label space. However, such a value space is prone to renumbering, even in the case of non-local topological change; hence, it is unsuitable. Another possible choice would be the reuse of the IP address space as existing Internet routing protocols operate using such a locator space. However, IP locators have no associated distance metric; meaning, no distance computation and no selective localization of content are possible when the same content object is available at multiple locations. Ideally, the locator space should be as independent as possible from topology changes, while also providing sufficient information to compute distances where it is processed provided that different receivers compute distances following the same distance function (we will see later the implications of the model selection when detailing coordinate computation). Instead, by associating data objects with content locators from a (geo)metric space (X,d)  where, to each element of the space X corresponds a globally unique geo­metric coordinate and d is a distance function (or metric) which verifies that ” x,y ∈ X, d(x,y)=d(y,x) > 0 when x≠y otherwise d(x,y)=d(y,x)=0 together with the triangular inequality, we satisfy these requirements and prevent renumbering in the case of topology change.

Geometric Routing on Content Locators

In addition, the selection of this coordinate space must be performed hand-in-hand with the design of a routing scheme capable of addressing the memory space complexity (O(n.(log(n))) characterizing stretch-1 routing algorithms such as Border Gateway Protocol (BGP). For this purpose, we introduce in “Geometric information routing”[5] a variant of geometric routing in which distances between nodes are computed from the length of the corresponding geodesic segments drawn out of the hyperbolic plane H2. From a routing-information-distribution perspective, this routing scheme operates following a modified distance-vector algorithm enabling the construction of geodesic segments. These segments are combined by means of procedures similar to pathlet/segment routing. To reduce the number of routing entries, coordinate sets are represented by geometric areas. As geometric coordinates are associated with the locations of content objects, the underlying model is referred to as geometric information routing. Coordinates can be computed offline by different procedures: by means of distance preserving graph embedding or a variant of the Vivaldi algorithm in the hyperbolic space represented by the hyperboloid model (also referred to as the Loid model). The distortion introduced by the hyperbolic model is determined by two parameters: the distance function and the curvature, which represents the amount by which an object in the space deviates from being flat (i.e., Euclidean). Instead of selecting the hyperbolic space curvature that produces the lowest error (thus avoiding the trial-and-error of embedding functions), we make use of the fundamental formula which relates the curvature κ of the hyperbolic space to the value δ of the topology graph. In other words, the knowledge of the geometric properties of the Internet topology yields a coordinate-computation algorithm of similar computational complexity to the canonical Vivaldi algorithm with Euclidean distance. If the coordinate computation procedure could not be unified with the routing-information primitive exchange, a simple extension to the Inverse ARP protocol would enable coordinate allocation.

Compared to the map-based model proposed in “Greedy forwarding in dynamic scale-free networks embedded in hyperbolic metric spaces”[6], the proposed approach does not rely on the construction of a virtual map obtained by applying empirical rules derived from the hidden hyperbolic space underlying the observable Internet router/AS topology, instead it uses the observable topological properties. Compared to Border Gateway Protocol (BGP) with IP geolocation it relies on the exchange of content locators taken out of the IP address space (also known as a push model). However, the IP locator space has no associated distance metric preventing selective localization when the same content object is available at multiple locations.

Figure 1. Content Name Registration and Resolution

To illustrate the operation of this information routing model, assume that the requester r associated with coordinate x queries for a given content object and issues a request following the process illustrated in Figure 1. Referring to the middle of Figure 2, the requester receives locator (coordinate) y1 which is associated to the server s1. Since the path following the minimum hop-count distance dG(r,s1) (in red) does not correspond to the path with the shortest hyperbolic distance dH(r,s1) (in blue), the path-selection process depends on the distance metric. Moreover, as a copy of a given object can be available at multiple servers locations s1,s4, which is often the case in distributed information systems, the requester may receive a response including multiple locators, i.e., the coordinates set {y1,y4} (see right-hand-side of Figure 2). From this set, it can determine the hyperbolic distance dH . Hence, it can also select the `nearest’ server where a given content object is accessible by computing the minimum distance dH which is actually different from one obtained if the hop count distance dG would be used to perform that selection.  Indeed, referring to the right-hand-side of Figure 2, one can see that min{dG(r,s1),dG(r,s4)} = dG(r,s1) (in blue) whereas min{dH(r,s1),dH(r,s4)} = dH(r,s4) (in green); thus, the selection of the closest (or nearest) server differs between the metric graph XG and the topology graph V(G). The reverse operation also applies: by receiving incoming packets that include in their header the coordinate associated to the source x (i.e., the locator associated to the requesting terminal r), the destination can determine the distance d(s1,r) without involving additional resolution or translation. This technique avoids requiring discovery of a reverse forwarding path from y1 back to the requester x, as is the case when the content object name is used to forward the request towards the content host.

Figure 2. Example for Geometric Information Routing Operation

Conclusion and Next Steps

We propose an alternative routing scheme for ICN in which content names are assigned locators, and geometric routing is performed on those locators. This model—in which routing decisions are performed on content locators—avoids name-to-locator resolution by intermediate nodes, and instead considers that name resolution by end hosts provides them with the information they need to select the destination. The salient feature of this addressing and routing model comes from (1) the property of coordinate-based content locators: these coordinates can be used by the distributed routing function to perform geometric routing decisions, and (2) the fact that as both localization and routing functions are performed on content locators, the result is localization of cached content. Indeed, as there is no distinction between a server and a cache locator, if an intermediate node keeps a local copy of a content object, it can register it as any other object stored on a content server. Subsequently, a requester could receive from the name resolver a (set of) content locator(s) associated to a (set of) intermediate node(s) instead of a server.

To demonstrate the successful operation of the proposed information routing scheme—based on functionality and potential performance gains in memory, bandwidth, and sender-to-receiver delay—we developed new routing and forwarding elements to enable packet processing according to geometric routing operations. We experimentally validated this information-routing model and scheme on the iLab.t virtual wall testbed and compared it with information-centric networking based on BGP with IP geolocation.[7] Against the latter, we gain a factor n (number of nodes) in memory space required to store routing information, and a factor √n in the memory space required to store routing tables while limiting the routing path stretch increase to a small additive constant (characterizing the geometric property of the topology). The notion of distance in geometric routing and caching in intermediate nodes affects the capacity utilization and results in more-balanced traffic on the links and a significantly decreased end-to-end delay between terminals and servers. Future work will investigate the possible unification of coordinate computation procedures with routing information exchange primitives. Novel caching strategies exploiting topology properties (such as node degree and centrality) remain for further exploration. Even more importantly, investigation of information routing on content locators can also seed new research topics combining addressing and routing algorithms.

References

  1. T. Koponen, M. Chawla, B. G. Chun, A. Ermolinskiy, K. H. Kim, S. Shenker, and I. Stoica, “A data-oriented (and beyond) network architecture,” Proc. of 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM SIGCOMM ’07, pp.181–192, New York, NY, USA.
  2. D. Kutscher (Ed.), “ICN research challenges,” Work in progress, February 2014.
  3. V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard, “Networking named content,” Proc. of CoNEXT 2009, pp.1–12, Rome, Italy, December 2009.
  4. L. Zhang, A. Afanasyev, J. Burke, V. Jacobson, K. Claffy, P. Crowley, C. Papadopoulos, L. Wang, and B. Zhang, “Named Data Networking,” ACM SIGCOMM Computer Communication Review (CCR), 44(3):66–73, July 2014.
  5. D. Papadimitriou, D. Colle, P. Audenaert, and P. Demeester. “Geometric information routing,” Proc. of  IEEE International Conference on Advanced Networks and Telecommuncations Systems (ANTS), pp.1–8, Dec.2013.
  6. F. Papadopoulos, D. Krioukov, M. Bogua, and A. Vahdat, “Greedy forwarding in dynamic scale-free networks embedded in hyperbolic metric spaces,” Proc. of IEEE INFOCOM 2010, pp.1-9, 2010.
  7. S. Sahhaf, D. Papadimitriou, W. Tavernier, D. Colle, and M. Pickavet, “Experimentation of geometric information routing on content locators,” To appear in Proceedings of CNERT 2014, colocated with International Conference on Network Protocols (ICNP) 2014, October 2014.