C500k in Action at Urban Airship
Editor’s Note: This post was compiled and created by Scott Andreas, who can be reached on twitter: @cscotta.
Building AirMail Push for Android’s Infrastructure
We’ve been working to develop AirMail Push for Android, our push platform, along with a supporting server-side infrastructure that can handle millions of concurrently connected devices. Like Push Notifications on Apple’s iOS platform, AirMail for Android maintains a persistent socket connection to our cloud, which lets us push out messages to devices in real-time. This brings with it all the complexity of handling hundreds of thousands of persistent socket connections server side, managing the connection on the client side as users pass in and out of coverage or switch between networks, and detecting and closing half-open sockets as quickly as possible.
This article describes the challenges we faced building out the server-side push infrastructure, how they led us to re-design and implement it using a hybrid threaded/evented/queue-based architecture based on Java’s NIO libraries, and a retrospective on this decision now that it’s live.
Lots and Lots of Sockets
In order to push a notification to a device, you need to have a persistent socket connection open to it. Our platform needed to be capable of serving millions of simulatenously-connected devices, which means that our server’s system stack and our software need to elegantly accept, read from, and push notifications out to a very large number of open connections per node. But how many? 10k? 50k? 100k? 500k? 1MM? More importantly, how many nodes do we need to handle this? If one node can only serve 10,000 clients, the venture is cost-prohibitive from the start – just one million devices would require a fleet of 100 servers.
We began our venture into uncharted territory with an implementation in Python based on Eventlet. Eventlet’s a fantastic library that provides support for concurrent task execution through lightweight coroutines. Its socket implementation allows developers to accept and manage an arbitrary number of connections with less difficulty and expensive context switches associated with threaded implementations. With it, the first version of our edge server “Helium” was born.
Eventlet helped us build a Helium whose code was incredibly straightforward, clean, and easy to understand. We love Python and are very grateful for the community of developers who are working to build libraries like this. Our initial implementation of Helium enabled us to accept and send application-level keepalives to roughly 37,000 connected clients before hitting the 1.7GB memory limit of a base EC2 instance. While pleased with this number, the amount of instances needed to meet our capacity requirements would have been prohibitively large and expensive. While we are constantly investing in our infrastructure to build a stable, secure, performant platform, we’re better programmers than we are sysadmins and prefer to write software that scales both efficiently and cost-effectively.
Things That Look Like Sea Monsters
Thus began our search to build a prototype of a more efficient socket server. We considered a variety of options, including a C/C++-based implementation wrapping [e]poll, a threaded or evented implementation in Java or Scala, a spike using Node.js, along with a few others. As expected, the purely-thread-based Java option fell over after a few thousand connections, and the C-based implementation proved too low-level to meet our evolving project requirements.
A quick experiment using Java’s NIO library (“new/non-blocking” IO) immediately turned heads, so we spent several hours fanning out that spike to include three versions:
- A raw Java NIO implementation
- An implementation of the same, atop the Netty library.
- And one more using Netty, but written in Scala rather than Java.
Each of these three spikes vastly surpassed our previous results both in terms of connections achieved, and memory efficiency:
Work Smarter, Not Harder
Jumping from 35,000 connections per node on an EC2 Small instance to over half a million on a single EC2 Large marked a huge win in terms of efficiency, infrastructure cost, and administration overhead — the number of connections per node jumped by nearly 1500%. (As a premature epilogue, this number held true after implementing all of our application logic, keepalives, and message passing, which is almost unheard-of in an early test).
As an interesting sidenote, the failure modes we saw in both the Java + Netty and Scala + Netty were unusual; CPU usage spiked to 100% in the Java process, with nothing output to the console. It appeared that an exception was being thrown and silently caught in a tight infinite loop.
Based on these numbers, we pressed on with a Java + Pure NIO implementation. Stay tuned – we’ll have a few posts on our NIO implementation, more info on our methodology and metrics, some things we’ve learned on the way, and all sorts of sharp corners we’ve hit our heads on that you can avoid.