I am an Associate Professor in the Electrical Engineering Department and an Affiliate in the Computer Science Department at Columbia University, where I am also a Member of the Computer Engineering Program and the Sense, Collect and Move Data Center at the Data Science Institute. I was previously the Andrew and Erna Viterbi Early Career Chair in the Computer Science Department at the University of Southern California, where I ran a networking and systems research group with my colleagues Ramesh Govindan, and Wyatt Lloyd. Wyatt and I continue to collaborate with students in an initiative to specialize distributed systems and networking for the large applications deployed by today's leading content providers.

My primary interests are in networks and distributed systems. My goal is to improve the reliability and performance of Internet services. To understand the problem space, I look to the needs of operators and providers, and I conduct detailed network measurements. Based on what I learn, I design deployable systems to improve the Internet and services that run over it. I focus on the two components needed for reliably fast Internet services: (1) the Internet must provide reliable, high performance routes for traffic, and (2) we need to architect quality services and protocols to use these routes, to take advantage of the Internet's strengths and mask its limitations. These days, I am particularly interested in the problems that affect some of the dominant players on the Internet, including cloud providers, large content providers and content delivery networks, and mobile providers. The properties of these networks allow for tailored solutions, and I want to understand the properties and design the solutions.

Selected Research Projects

Jump to research area:

Internet Measurement: Our Internet experience depends on the performance and availability of routes that cross multiple networks, but service providers and operators have little visibility into the other networks on which they rely. We have developed a number of techniques to give unprecedented visibility into these routes. Here are some highlights:
Towards a Rigorous Methodology for Measuring Adoption of RPKI Route Validation and Filtering (SIGCOMM CCR 2018, Best of CCR Award): The Border Gateway Protocol is responsible for establishing Internet routes, yet it does not check that routes are valid, allowing networks to hijack destinations they do not control. There are proposals to improve routing security, and one—the RPKI—is standardized. Recent analysis (by others) attempted to use uncontrolled experiments to characterize RPKI adoption by comparing valid routes and invalid routes. However, our measurements suggest that, although some ISPs are not observed using invalid routes in uncontrolled experiments, they are actually using different routes for (non-security) traffic engineering. We describe a controlled, verifiable methodology that uses our PEERING testbed to measure RPKI adoption.
Sibyl: A Practical Internet Route Oracle (NSDI 2016): Existing tools support only one quer—"what is the path from here (my host) to there (any destination)?" This limited interface makes it difficult to troubleshoot problems. Sibyl supports queries such as "find routes that traverse from Sprint to Level3 in NYC but do not pass through LA." However, most vantage points can only issue measurements at a slow rate, and so Sibyl may have never previously measured a matching path or, even if it did, the path may have changed since. To smartly allocate the constrained measurement budget, Sibyl uses previous measurements and knowledge of Internet routing to reason about which unissued measurements are likely to satisfy queries.
Reverse Traceroute (NSDI 2010, Best Paper Award): Most communication on the Internet is two-way, and most paths are asymmetric, but traceroute and other existing tools only provide the path from the user to a destination, not the path back. We addressed this key limitation of traceroute by building a system to measure the path taken by an arbitrary destination to reach the user, without control of the destination.
PoiRoot (SIGCOMM 2013): It is difficult to identify the cause of an observed routing change. A change results from the complex interplay among opaque policies of multiple autonomous networks and local decisions at many routers. A decision or reconfiguration can cause rippling changes across seemingly unconnected networks. We developed PoiRoot, the first system to definitively locate the source of a route change. PoiRoot infers routing policies from observed routes, an approach we also used to measure real-world routing policies (IMC 2015) and to predict routing decisions (NSDI 2009).

Internet Routing: Despite rapid innovation in many other areas of networking, BGP, the Internet's interdomain routing protocol, has remained nearly unchanged for decades, even though it is known to contribute to a range of problems. I work to improve Internet routing, including:
Engineering Egress with Edge Fabric (SIGCOMM 2017): BGP is not aware of capacity, congestion, or performance, and so can overload routes or select ones with poor performance. We collaborated with Facebook to design and deploy Edge Fabric, a system which now controls how Facebook routes content to its two billion users. Edge Fabric augments BGP with measurement and control mechanisms to overcome BGP's lack of congestion- or performance-awareness, allowing Facebook to make efficient use of its interconnectivity without causing congestion that degrades performance.
PEERING: Researchers usually lack easy means to conduct realistic experiments, creating a barrier to impactful routing research. To remedy this problem, we administer a BGP testbed that allows researchers to connect to real ISPs around the world and conduct experiments that exchange routes and traffic on the Internet. We continue to expand the functionality of the testbed, including peering at one of the biggest Internet exchanges in the world and adding the ability to emulate the AS topology of your choice (HotNets 2014).
Are We One Hop Away from a Better Internet? (IMC 2015): The Internet remains hamstrung by known routing problems including failures, circuitous routes, congestion, and hijacks. Proposed improvements stumble on barriers to adoption. We identified a possible foothold for deployable solutions: much of our Internet activity centers on popular content and cloud providers, and they connect directly to networks hosting most end-users. These direct paths open the possibility of solutions that sidestep headaches of Internet-wide deployability.
LIFEGUARD (SIGCOMM 2012): Internet connectivity can be disrupted despite the existence of an underlying valid path, and our measurements show that long-lasting outages contribute significantly to unavailability. We built LIFEGUARD, a system to locate persistent Internet failures, coupled with protocol-compliant BGP techniques to force other networks to reroute around the failure.

Internet Content Delivery: Increasingly, most Internet traffic comes from a small number of content providers, content delivery networks, and cloud providers. We work on a number of projects to understand and improve these services, including:
Odin: Microsoft's Scalable Fault-Tolerant CDN Measurement System (NSDI 2018): Content delivery networks (CDNs) host services at locations around the world to try to serve clients from nearby. We worked with Microsoft to design and deploy Odin, our measurement system that supports Microsoft's CDN operations. Odin has helped improve the performance of major services like Bing search and has guided capacity planning of Microsoft's CDN. Our paper presented the first detailed study of an Internet measurement platform of this scale and complexity.
Anycast Performance (IMC 2015): CDNs can use a number of mechanisms to map a client to a nearby server. One popular mechanism is anycast. We examined the performance implications of using anycast for Bing, which uses a global CDN to deliver its latency-sensitive service. We found that anycast usually performs well, but that it directs 20% of clients to suboptimal servers. We showed that the performance of these clients can be improved using a simple prediction scheme.
Mapping Google (IMC 2013): We developed techniques to locate all Google servers, as well as the mapping between servers and clients. In serendipitous timing, we started mapping daily just as Google embarked on a major change in their serving strategy, and so our ten month measurement campaign observed a sevenfold increase in the number of Google sites.
Peering at the Internet's Frontier (PAM 2014): While the Internet provides new opportunities in developing regions, performance lags in these regions. The performance to commonly visited destinations is dominated by the network latency, which in turn depends on the connectivity from ISPs in these regions to the locations that host popular sites and content. With collaborators at various institutions, we took a first look at ISP interconnectivity between various regions in Africa and discovered many Internet paths that should remain local but instead detour through Europe.

TCP Performance: TCP is the workhorse of the Internet, delivering most services. Perhaps surprisingly, given how much study it has received, it is still possible to modify the protocol for significant gains. We use measurements to understand TCP problems in modern settings and tailor solutions to those settings. We have a number of ongoing projects in this area. Since loss slows TCP performance, we have developed new techniques to deal with congestion and loss in different settings:
Studying Internet traffic policing: Some ISPs actively manage high volume video traffic with techniques like policing, which enforces a flow rate by dropping excess traffic. In collaboration with Google (SIGCOMM 2016), we found that loss rates average six times higher when a connection is policed, hurting video playback quality. We showed that alternatives to policing, like pacing and shaping, can achieve traffic management goals while avoiding the deleterious effects of policing. We then analyzed data collected over a six years period (USC tech report), finding that that the use of policers in developing nations has dropped over time, as Internet infrastructure became more widely deployed. Finally, we studied T-Mobile's BingeOn service for cellular users (Workshop on Internet QoE, 2016). We found that by default BingeOn throttled all video traffic but only charged user data plans for video from services not participating in BingeOn, there were no video- or screen-specific optimizations being used, and this policy can have a negative impact on user quality-of-experience. We also found that BingeOn is easily subverted to free-ride on T-Mobile.
Gentle Aggression (SIGCOMM 2013, IETF Applied Networking Research Prize): In collaboration with Google, we designed new TCP loss recovery mechanisms tailored towards the different stages of Google's split TCP architecture, resulting in a 23% average decrease in Google client latency.
DIBS (EuroSys 2014): In collaboration with Microsoft Research, we designed a loss avoidance mechanism for data centers. Since congestion in data centers is generally transient and localized, we propose that switches randomly detour traffic that encounters a hot spot, allowing the congestion to dissipate.

Mobile Web Performance: As we continue to spend more of our time accessing richer services on the Web from mobile devices, performance from these devices becomes more important and, often, fails to meet expectations.
Making the Mobile Web Fast (with Google): Before joining USC, I worked at Google on a team dedicated to making the Web fast on mobile devices. You should try out the team's data compression proxy for Chrome for Android and iOS.
Path Inflation of Mobile Traffic (PAM 2014): In collaboration with my former team at Google, my students and I classified the causes of circuitous paths between mobile clients and Web content. We now work with the MobiPerf project on ongoing related measurements.
Investigating Proxies in Cellular Networks (PAM 2015): While it is well known that cellular network operators employ middleboxes, the details of their behavior and their impact on Web performance are poorly understood. We developed a methodology to characterize the behavior of proxies deployed in the major US cellular carriers, including their (often negative) impact on performance.

Awards

Funding and Support

I am currently funded by Google Faculty Research Awards, an M-Lab Network Research Grant, Facebook, a Comcast Innovation Fund Research Grant, an NSF CAREER Award, and by the NSF. I am very grateful for their generous support. In addition, SpeedChecker, RIPE Atlas, and M-Lab kindly support our research by providing access to their platforms for our measurement studies.

Brief Biography

In 2012, I completed my Ph.D. in the Department of Computer Science at the University of Washington, advised by Tom Anderson and Arvind Krishnamurthy. For my dissertation, I built systems that can help service providers improve Internet availability and performance and that are deployable on today's Internet. After that, I worked for half a year at Google's Seattle office, as part of a great team tasked with making the mobile web fast. I greatly enjoyed the opportunity and learned a lot. I joined USC in 2012, and I was named Andrew and Erna Viterbi Early Career Chair in 2016. In 2017 I joined Columbia University.