Date: June 7, 2009
As engineers, we have to build reliable systems from unreliable parts. Cables get cut, and from time to time, routers, switches, and power systems fail. Network operators address these eventualities by installing redundant connections and equipment. Routing protocols are then able to find the shortest of the multiple available paths between any two points.
The dirty little secret of Internet routing is that even if a VoIP user and an FTP user were able to agree on which path properties are the most desirable, when given multiple options”š our routing protocols aren’t very good at selecting the best path. In reality, once the routing protocols have done their job, only a single path-or, possibly, a small number of equal-cost paths-is used to transmit packets, ignoring possible alternative paths.
The idea behind multipath TCP is to make use of the additional paths that are ignored by the routing system. Doing so can provide (1) more bandwidth and better resiliency for the user and (2) higher network utilization for network operators. During the IETF 75 meeting in Stockholm, there was a well-attended birds-of-a-feather (BoF) session that explored multipath TCP. The outcomes of the BoF are expected to lead to the formation of a multipath TCP working group within the IETF.
Adjusting to Path Properties through Congestion Control
Multipath TCP (MPTCP) involves modifying TCP to give it the capability to send a given packet over a given path. It then runs the regular NewReno congestion control algorithms separately for each path, so each path has its own congestion window that reflects the path’s available bandwidth. Therefore, MPTCP can send (1) many packets over paths with a large congestion window and/or a small round-trip-time (RTT) and (2) fewer packets over paths that have a small window and/or a large RTT. This way, a multipath TCP can automatically-and quickly-adjust to congestion in the network, moving traffic away from congested paths and toward uncongested paths. Routers can also distribute packets over multiple paths, but routers have no end-to-end congestion information. Additionally, routing protocols that change routing based on link utilization have never gained a foothold because they can cause dangerous oscillations as they move traffic to idle links, which can then become congested, move traffic back, and so on.
Another concern over having routers distribute packets that belong to the same flow (TCP session) over multiple paths is that different paths have different RTTs, which means that packets will arrive out of order. When this happens, the fast-retransmit algorithm that is part of the Reno set of congestion control algorithms will do what its name suggests. When packets n+1, n+2, and n+3 arrive but packet n is still missing (three duplicate acknowledgements), TCP concludes that packet n was lost, and it retransmits it without waiting for a timeout. It also assumes that sending was happening too fast, so it reduces its sending rate by halving the congestion window, which lowers the number of packets that can be in flight before the receiver acknowledges that it has received earlier packets. However, if packet n was sent over a path with a slightly larger RTT than the path used for sending packets n+1, n+2, and n+3, n probably was not lost, so there was no need to retransmit and reduce the sending rate. To prevent this issue, routers work hard to avoid sending-over multiple paths-packets belonging to a single flow (see RFC 2992). Multipath TCP can solve this issue by simply executing the fast-retransmit algorithm for each path separately.
Fairness and Resource Pooling
A more challenging issue is that of fairness. Suppose there are two paths between points A and B; let’s call them left (l) and right (r). Flow X takes the left path, flow Y takes the right path, and flow Z is a multipath TCP flow that uses both paths, with a subflow Zl over the left and a subflow Zr over the right. Assuming normal congestion control, Zl will compete on an even footing with X, such that both Zl and X get half the available bandwidth on the left path (in the absence of other traffic). The same is true for Zr and Y. This means that the total send rate of Z (Zl + Zr) is double that of X or Y. This is a little unfair, but not to an alarming degree. However, what if a multipath TCP thinks it has five paths, but in reality all of these paths share a bottleneck? In this case, the multipath flow will take five times as much bandwidth as other flows, which falls squarely outside the limits of what is acceptable. Presumably, paths toward the same destination will regularly share a common bottleneck, and there is no obvious fast and reliable way to detect this condition.
As a result, on one hand, multipath TCP requires a way to couple the congestion control state of the different subflows such that MPTCP is reasonably fair to unmodified TCP flows when multiple MPTCP subflows share a bottleneck. On the other hand, MPTCP must be sufficiently aggressive so that it is at least as fast as unmodified TCP on the fastest path. For instance, in the previous example, if the left path has a bandwidth of 100 Mbps, and the right path, 10 Mbps, the fact that MPTCP flow Z starts to lose packets at around 5 Mbps over the right path should not get in the way of subflow Zl’s reaching about 50 Mbps on the left path, or Z will be slower than X. Obviously, users will be reluctant to deploy multipath TCP if it performs worse than unmodified TCP!
A step beyond “first do no harm to my own performance”? and “be fair on shared bottlenecks”? is the notion of resource pooling. Suppose a network operator has deployed a number of parallel links, and several MPTCP flows run over two of those links each. With the appropriate coupling of the per-subflow congestion control for each multipath TCP flow, the packet loss rates that NewReno congestion control uses to adapt to the network bandwidth will equalize, and the set of links will start to behave as a single resource. Assume again the earlier example, but now with 100 Mbps on both paths. Perfect resource pooling would mean that X takes 67 Mbps on the left path; Y, 67 Mbps on the right path; and Z, 33 Mbps on each path for a total of 66 Mbps. This is the same as would happen on a single 200 Mbps path. Compare this to the situation today, where a network operator bundles two links. In the best-case scenario, two flows would each get 50 Mbps on a shared link, and the other flow would get the full 100 Mbps on the other link. But the situation where three flows share one link and run at 33 Mbps each while the second link remains idle is also relatively common. According to the literature, it takes about 1,000 sessions to get good utilization out of a set of bundled links.
In addition to potentially providing more bandwidth for the multipath user and resource pooling, as well as improved utilization for the network operator, MPTCP may also improve resilience; in other words, if one path fails, MPTCP can continue to work over an alternative path. On one hand, presumably, the routing system would detect such a path failure and repair it, but routing protocols take 30 (OSPF) to 90 (BGP) seconds to detect a failure. Multipath TCP, on the other hand, could detect a failure within a few round-trip times. This does require some smart retransmission algorithms. TCP must deliver data to the receiving application in the original order, so if packet n is lost, packets n+1, n+2, and so on must be buffered by the receiver until packet n is retransmitted successfully. Therefore, it is important to avoid retransmitting packets on a broken path. And waiting for a normal timeout on one path can slow down progress on other paths if the receive buffer fills up and the subflows using the other paths must stop sending.
There are currently two multipath TCP drafts.
On one hand, draft-ford-mptcp-multi-addressed-00: TCP Extensions for Multipath Operation with Multiple Addresses adds a mechanism to TCP to explicitly add new subflows to an established TCP session, where each subflow has its own source and destination addresses. For this to work, at least one end (preferably both ends) must have at least two addresses, and both ends must implement the multipath TCP extensions. Packets are sent down different paths by addressing them to the different destination addresses available for the remote system. The multiaddressed multipath TCP has a second sequence number space carried in TCP options, so that the regular seqnum and acknowledgement fields can remain compatible with existing middleboxes such as NATs (network address translations).
On the other hand, draft-van-beijnum-1e-mp-tcp-00: One-ended multipath TCP is one ended; in other words, it modifies only the sender. This means that like regular TCP, only a single source address and a single destination address are supported. This means there is no easy way to send packets down different paths. However, if the one-ended multipath TCP is deployed in content networks that have multiple links to the Internet by using provider independent address space, this could work. In one instance, a server could have two interfaces, where the packets transmitted over one interface are sent through one Internet connection, and the packets transmitted over another interface are sent over another Internet connection. This solution takes advantage of the fact that congestion control, including fast retransmit, is implemented in the TCP sender, so it is possible to use multiple paths without changing the receiver, as long as the receiver uses selective acknowledgements (SACK, RFC 2018). This solution assumes that middleboxes, such as firewalls and NATs, are deployed only at the receiving end, where the two paths have converged, so the middlebox sees normal TCP traffic.
An intriguing avenue for future development would be modifying routers such that when they need to distribute packets over multiple links or paths, they do so based on the path selection choices made by multipath TCP on the sending host. Doing so would make it possible to gain MPTCP benefits without the need for multiple addresses or multiple connections to the Internet. However, in order to do that, an MPTCP host would have to include information about the desired path in the IP or TCP header where routers can find it to base their path selection decisions on. Unfortunately, there aren’t any unused bits in the IPv4 or IPv6 headers, and TCP options have the disadvantage of not being in a fixed place, thereby making it harder for a router to inspect those. For this reason, and because requiring changes to both hosts and routers makes deployment a lot harder, it was decided to not work on involving routers in multipath TCP for the time being in the pre-BoF discussions.
As discussed earlier, using congestion control to adjust to the available bandwidth on each path is a powerful advantage of implementing the multipath capability in the transport layer. But why TCP, and not a new or separate protocol? For instance, the Stream Control Transmission Protocol (SCTP, RFC 4960) already supports using multiple addresses, thereby making it easier to use multiple paths at the same time. However, as currently defined, SCTP uses only one path at a time, and it switches to another path only after the current path fails. There has been a fair amount of academic work on multipath SCTP under the moniker CMT (concurrent multipath transmission). SCTP CMT shares the same basic issues with congestion control fairness and performance and retransmission policies as multipath TCP does, but the amount of protocol work required is minimal, since SCTP already supports multiaddressing. Unfortunately, moving existing applications from TCP to SCTP involves a number of challenges, such as making SCTP work through NATs, the need to modify applications, and lack of an easy way to negotiate SCTP versus TCP between a client and a server. In addition, SCTP uses a much more complex packet format than TCP does and a CRC32 checksum that is expensive to compute in software, making SCTP less appropriate for high-bandwidth applications. None of the issues is insurmountable, but together they make adoption of SCTP as a TCP alternative a challenge. As such, implementing multipath in TCP, which is used for about 85 percent of all Internet traffic, is a more attractive deployment strategy. See also Multipath TCP discussions.
- M. Mathis, J. Mahdavi, S. Floyd and A. Romanow; TCP Selective Acknowledgment Options; RFC 2018; October 1996.
- C. Hopps; Analysis of an Equal-Cost Multi-Path Algorithm; RFC 2992; November 2000.
- D. Wischik, M. Handley, M. Bagnulo; The Resource Pooling Principle; ACM SIGCOMM Computer Communication Review archive; October 2008.
- J. Iyengar, K. Shah, P. Amer, R. Stewart; Concurrent Multipath Transfer Using SCTP Multihoming; SPECTS 2004, San Jose, California; July 2004.
- A. Ford, Ed., C. Raiciu, M. Handley, S. Barre; TCP Extensions for Multipath Operation with Multiple Addresses; draft-ford-mptcp-multiaddressed-00; May 2009.
- I. van Beijnum; One-ended multipath TCP; draft-van-beijnum-1e-mp-tcp-00; May 2009.