Scaling Urban Airship’s Messaging Infrastructure to Light Up a Stadium in One Second

More than two years ago we embarked on a journey to bring our push infrastructure to Android and tackle the world of high scalability for mobile apps. We’ve detailed some of our past achievements like C500k in past blog entries. We’ve more than quadrupled the performance reported in that post, and have moved on to address the rest of our infrastructure as we march towards supporting billions of connected devices.

We’re now focused on finishing a massive rework of our messaging infrastructure to support our exponential growth in push notification volume. The result will be released with Segments.

We deal with lots of big numbers. Big numbers are often difficult to reason about. As a metaphor, we came up with “light up a stadium in a second” as our throughput goal. Specifically, we now have the capability to send a message in one second to every fan seated in the biggest stadium in college football, Michigan Stadium.

Here’s a high-level look at what we’ll be rolling out:

  • Push Tag & Broadcast Service (codenamed Metalstorm) — Manages associations between applications, devices and tags. This new service supports extremely high push throughput and gives us the ability to perform complex tag queries for apps with hundreds of millions of users across a horizontally scalable architecture.
  • Segments Data Storage (codenamed Penelope) — A customized, distributed database optimized for querying spatial information including custom location data.
  • Message Routing Service (codenamed Gooey ButterCake) — Routing tier that uses a Sort-Merge-Join algorithm to assemble results from queries across multiple heterogeneous systems, for example application tags joined with device location information.
  • Edge Message Delivery Service (codenamed Yaw) — Handles last-mile delivery to third-party platform push providers such as the Apple Push Notification System (APNS) with high throughput and low latency. Yaw manages TLS negotiation, message TTL (time to live) and protocol compliance across hundreds of thousands of connections all performing message delivery.

Helium, our end-to-end message delivery platform for Android devices, is optimized to work with this improved messaging infrastructure and works on an expanding universe of devices including Amazon’s Kindle Fire, Nook, and soon Tizen.

We’re also going to leverage Helium for companies moving away from native apps to HTML5. We built a C++ Helium client library for Linux and are working on Tizen integration (more detail on this effort is here). The new Linux Helium client architecture includes a web runtime plugin that provides JavaScript bindings for easy development of HTML5 browser extensions for push notifications.

How improved is the new infrastructure? Our initial dark launch of the new system delivered broadcast pushes at a throughput of over 100,000 messages per second, with a 90th percentile latency of two seconds to first message delivery. We also delivered tag pushes (one API call with arbitrary tags pushing to one or more devices) at over 100,000 messages per second throughput with 90th percentile latency of two seconds before first push delivery.

What does this mean for apps? If your app has 100,000 users, and the app triggers a push notification, your users will receive that message in less time than it takes you to complete a yawn. If your app has 500,000 users, and the app triggers a push notification, your users will receive that message in less time than it takes you to pour a cup of coffee. More than a million users? Pour that coffee, add sugar and stir.

We’re just getting started. This is the low end of what we can achieve with this architecture and we will continue to invest in throughput and latency improvements. Look for an upcoming blog post on the implementation details or better yet, see us talk about it live at upcoming events such as GLUE Conference and HBaseCon.

We’re still working on rollout plans, but this massive scalability for real-time push will be available for everyone who needs it soon. We’ve already moved many of our largest customers to the new architecture. If you have a need for speed right now, and think your system can handle these performance levels before we add configurable speed throttling, let us know.

We’re also interested in understanding your expectations for push notification delivery. Complete this brief survey to get an Urban Airship t-shirt.