How We Built Urban Airship Connect
Published on 6 Oct 2015
By Cory Kolbeck
Previously, Michael Richardson talked about Why We Built Urban Airship Connect. This post describes the how.
At Urban Airship, we’ve built a very powerful infrastructure to help our customers communicate with their users on smartphones. Communication is a two-way street, though. Mobile users have a lot to say. Things like: “I opened your app!” “I’m in San Diego!” “I like your product!” The problem is that mobile users’ phones can be unreliable narrators. At times, smartphones seem to speak different languages and dialects. Other times, they appear to be obfuscating information at best, lying at worst. It takes time and patience to sift through their chatter and discern their intentions, but we’ve been handling updates from billions of apps for years. We’ve seen pretty much all of their tricks before, and we understand the complex world that they inhabit. A world of OS upgrades, spotty carrier coverage, outdated apps and Daylight Savings Time. Making sense of what’s coming in seems daunting, but that’s where Urban Airship Connect comes in, translating the noise into a simple data stream, ready for integration and development.
It began at Hack Week
One of the great things about working at Urban Airship is that from time to time, engineers take a week to work on a project of their choosing to help drive innovation, efficiency, and quality. We set a theme for Hack Week which many of the projects focus on, but teams are encouraged to work on whatever will help make Urban Airship better. Not every Hack Week project ends up turning into an Urban Airship product or service, but when a project has legs, the company invests in it. Several of the major strides we’ve made as a company can be traced back to Hack Week or its predecessor, Free Friday.
Urban Airship Connect evolved out of a Hack Week project called simply, Mobile Event Stream. The goal was simple: Could we create a way to get access to all of the events occurring in an app, as they were happening, and at a per-user level? Thanks to previous architecture investments and choices we had made, the answer was yes.
Designing a system
In order to provide the level of performance that our industry demands, we recognized that a streaming data service would need to be flexible, scalable and fault tolerant. We settled on a microservice architecture, which runs many services, each responsible for a small task. These services use Apache Kafka to pass messages to each other asynchronously. The belief that the messages we pass to ourselves are valuable to customers is a guiding principle of Urban Airship Connect.
At its core, Urban Airship Connect transforms and enriches these internal events and makes them available for consumption by customers. To accomplish these tasks, we wrote two services: Egon and Firehouse (We’re big Ghostbusters fans).
Egon sits between our internal streams and the streams available for direct consumption by our customers. Because each event in our internal streams carry only as much information as needed for its primary task, additional context is needed to make it useful for a customer. Egon collects this context from sources across our stack and decorates the event with it before sending it into the appropriate application stream on a dedicated Kafka cluster.
Firehouse provides a secure HTTPS frontend for the streams Egon produces, transforming our internal message structure into whichever format is easiest for customers to consume. In addition, Firehouse provides powerful operations on the stream, allowing users to randomly sample from the stream, to limit their stream to certain types of events, to split their stream deterministically for consumption in parallel, or to accomplish sensible combinations of the above. In addition to exposing this streaming HTTP API directly to customers, we have partnered with a number of companies to provide simple, easy-to-use integrations. Partners consume from the API as a customer would, and implement a specialized solution that a customer can just turn on. We think that giving our customers direct access to this, as well as our partners, is a powerful statement to the market and something that should be adopted by other companies.
On the inscrutability of mobile devices
Though Urban Airship Connect is simple, the system it sits on is complex. The sophistication of our offering, the scale at which we operate, and the more than two billion app installations sending data into our system mean we encounter a variety of one-in-a-million issues thousands of times a day. We use a suite of techniques built up over the years to either correct for these issues or weed out data too suspect to be useful.
Mobile device clocks are unreliable and can be off by hours or even years. Device clocks can skip backwards and forwards as they receive signals from different towers. At least clock problems are rational, in the literal sense. End users are anything but rational when it comes to the ways they use their devices and your app. Because end users can’t be forced to upgrade, old versions of apps (and old versions of our SDK) are still out there in large enough numbers that we must accommodate for them.
And if clocks and people weren’t enough to worry about, networks can introduce a whole new layer of confusion. Devices with an unreliable network connection may think that they’ve failed to send events into our systems when they’ve actually succeeded. Machines and services can fail or lose communication with each other.
It wasn’t all invented here
These problems are difficult, but we set ourselves up for success: Every piece of the of the Connect system has been built with industry standard technology and the distributed systems best practices we’ve honed over more than six years of operating at mobile scale. We’ve built our business on a deep understanding of how to have conversations with phones, and Connect allows our customers to take advantage of that understanding like never before.
We’re excited about this release for a number of reasons. Excited to see what problems it will help you solve, excited to hear from you about it and excited to see what happens when you cross the streams.
Early access documentation for the core API an be found at http://docs.urbanairship.com/topic-guides/connect-api.html.
If you’re currently an Urban Airship customer, please contact your account representative to find out more about Connect.
Not a customer yet, but interested in learning more? Contact our team.