Topics API: How Privacy Sandbox on Android connects people with ads
Topics API is how Privacy Sandbox on Android — and on Chrome — will help advertisers connect with relevant audiences. It will not surprise anyone in the mobile apps industry that one of the core components of advertising technology is figuring out how to connect people and ads. Fraudsters and low-quality ad networks don’t care who they show an ad to — or even that a real person sees an ad — but ad partners worthy of the name expend a huge amount of effort and painstaking care in building technology for high quality matching between advertisers and potential users or customers.
In Privacy Sandbox for Android, Google’s proposing to take on that burden with the Topics API.
This post is part of a series on Privacy Sandbox:
- SDK Runtime
- Topics (this post, on how Google sees ad targeting working)
- FLEDGE on Android (which has been renamed to Protected Audience API)
- Attribution Reporting (where we’ll do a deep dive into how Privacy Sandbox will impact measurement)
Let’s start by taking a giant step back and thinking about how to do ad targeting in general. There are multiple ways to target people with ads, of course, but the problem is that many of them have significant privacy downsides.
|Requires trackingGenerally invasive
|Requires metadataCan be dangerous over time
|Requires third-party dataRequires some level of identity resolution
|Requires knowledge of context
|Requires knowledge of interests
|Requires knowledge of searches
Every targeting mechanism can be potentially privacy-safe, but each has risks and some modes have higher risks than others.
For example, Apple’s SKAdNetwork is deterministic: each install postback is generally speaking a 100% guarantee that someone has installed your app. However, despite that, because it’s not granular per user and because it obscures significant amounts of marketing measurement data via privacy thresholds in SKAN 3 or crowd anonymity in SKAN 4, it’s low-risk for privacy violations.
Digital behavior such as website visits or app usage, generally acquired via trackers like third-party cookies or advertising identifiers (IDFA, GAID), is fairly invasive. But it could theoretically be made privacy safe through additional features such as differential privacy, grouped audiences, noise addition, and so on. (The question would be: could a high-volume advertiser break the code, get enough data, demystify the data, and identify people specifically? And, do you trust whoever is creating the audiences?)
Context is generally privacy safe: an ad relates to the content on the website or in the app that surrounds the ad, not to the person.
The logical leap that advertisers take from context, of course, is that people who are viewing information about sports, for example, are interested in sports, perhaps play sports, might like beer or party food, might be interested in buying sports memorabilia, and so on. That logical leap can be a short one, an obvious one, a long one, a clever non-obvious one, or completely wrong.
Interests and Topics API
Interests, which Google is leveraging in Topics API for the Android privacy sandbox, are interesting.
Topics are more than context. Context is good, but it’s limited to a specific page or screen. Someone could have clicked the wrong link, followed clickbait, or tapped on the wrong screen in an app. Alternatively, it could be the right screen, but they’re only on it to get a needed piece of data, not because they are deeply and passionately interested in whatever that context is.
Interest, on the other hand, should be much more global and persistent over time.
Example: I like sports. That won’t change from week to week or month to month, though the exactly sports I engage with are likely to evolve over longer periods of time, or change over different seasons. And while Google’s building Topics API for fairly short-term interests, it’s still longer-term than just immediate context.
Google’s privacy sandbox defines a topic as a human-readable area of interest that someone demonstrates engagement with in the recent past. In fact, likely within just the last three weeks.
Topics are based on a user’s recent ad usage and app installs, but if a user deletes an app, any topics associated with that app won’t be removed from their list of topics, “in order to avoid disclosing information about the uninstallation,” Google says.
One challenge: Google’s current vision for topics is that there is a very limited taxonomy of them: between a few hundred and a few thousand topics, which Google will share to the marketing community at a later date. They will be human-curated so they won’t include sensitive topics, and are specifically intended to not be very granular:
“The Topics API intends to provide callers with coarse-grained advertising topics of interest based on the user’s app usage,” Google says.
The reason they’re not very granular: early tests with much higher levels of topic granularity proved insufficiently private. In other words, if you see enough data on what I am interested in, you’re likely to be able to triangulate my identity, or at least follow my digital trail around the internet. Clearly, Google wants to avoid too-tight targeting based on topics that adtech SDKs save and remember over time, associating with a user, in order to develop a very detailed profile of a person that could be used in privacy-threatening ways.
Or to develop a person or device graph.
Interestingly, Google says that potential topics for a user are defined by a classifier model for apps.
In other words, apps feed the taxonomy for users, and the data that trains the model uses publicly available information like app name, description, and package name. Apps can map to multiple topics, or none, but there is a limit: no matter how many they have, only 3 will be added to the user’s topic history in any given week.
There is a degree of user control, however:
“The design intends to provide users with the ability to view and remove the topics that are associated with their app usage,” Google says.
There’s no mention of people adding topics manually, which could be interesting. I can see challenges with that, like people adding totally spurious data, but it could be useful nevertheless: I could define the topics that I want to see ads about, and Android’s privacy sandbox could ensure I only (or mostly) see ads about those topics.
(And perhaps never see other topics on a different defined list.)
Targeting ads with Topics API: how it works
Every week, Google computes a user’s top 5 topics, which stay on-device. Google’s technology is figuring this out based on recent apps and app usage, but the information stays on-device and Google doesn’t know it.
When an app that monetizes via ads wants to fill an ad slot, the adtech SDK in the app calls the Topics API and checks if there’s a topic in the user’s list that matches a topic assigned to the app. Note: Google specifically prohibits SDKs or apps from storing that data and using it over time to build up a more detailed picture of the user’s interests and topics.
- Once a week, Android calculates a user’s top 5 topics
- Calling Topics API successfully will get your a topic randomly chosen from that list, 95% of the time
- 5% of the time, you’ll get a different topic, randomly chosen from the full taxonomy
- When you successfully call Topics API, you’ll get a maximum of 3 topics, 1 for each of the previous 3 weeks
- This means Android must store at least 15 top topics, many of which are likely to be duplicates
- Apps get different topics, thanks to the random pick and noise introduced, to ensure that app A and app B, which might be owned by the same publisher, can’t triangulate on individual users
- Successfully calling the Topics API, however, requires that apps or SDKs must have observed engagement with that topic within the past 3 weeks.
- It’s not 100% clear, but how I read this it that your general topic app with high usage but low-value ad slots can’t call a topic like “injury lawyer” or “auto insurance price quote” to try to capture high-value ad impressions
- In addition, Google says that if the app or SDK “ did not call the API in the past for that user on an app about that topic, then the topic will not be included in the list returned by the API”
- So, while it’s not entirely clear, since either the app or SDK can view that kind of engagement, there may be a scale advantage built into Topics in Privacy Sandbox for Android, simply due to the fact that larger, more-distributed SDKs will see more
- That scale advantage will be critical, too, because it’s not just lifetime visibility of topics we’re talking about: it has to be in the last three weeks
Note that apps can completely opt out of Topics API via their manifest and XML elements. In this case, apps don’t contribute to the weekly topic computation, Google says. Some larger apps that don’t see value in Topics or want to share any information about themselves to third-party ad SDKs, and some smaller apps that are extremely privacy-sensitive might therefore opt out entirely.
Current challenges with Topics
There are a number of potential problems with topics. Note that this is early and Google is seeking feedback, so these may all be resolved over time.
- Scale: if Google can’t convince enough publishers and adtech companies to opt in, Topics API won’t have enough data to produce meaningful results. On mobile particularly, this won’t be a problem since apps and advertisers need to opt in to get marketing measurement.
- Visibility: as Google has defined Topics API, you won’t see topics returned that you haven’t observed in the past. This could seriously limit targeting.
- Granularity of topics: with only hundreds to a few thousand topics, it’s hard to get granular. Sports is a big topic, but I may like 7 or 10 different sports, and I may follow 15 or 20 teams, and like 30 or 40 players.
- Limited number of topics for apps: apps will only be able to add 3 topics to a user’s topic history in a given period of time. That’s really limiting for a big app like the NY Times, which deals with literally hundreds if not thousands of topics.
- No location info: Topics API does not currently seem to have any knowledge or provision for location information, which can be a critical component for targeting and can impact topic taxonomy significantly as well.
- Gaps when there’s no match: what happens if there’s no correlation between topics assigned to an app and topics that a user has generated? What ads do people see then?
Interest, Google’s next steps, and chiming in
There are some limitations to interest as a targeting factor, of course. Just because I like sports doesn’t mean I want to buy a jersey from my favorite team. Interest is also general and long-term, versus actionable behavior driven by things like searches, which show higher intent and are likely therefore higher value.
Also, interests are hard to correlate to purchase intent. As Shamanth Rao at Rocketship HQ says:
“The 5 ‘topics’ an SDK is accessing about a user are a very small sliver of a user’s persona. If the topic assigned to me is Europe, is that because I’m an American wanting to visit the continent? Because I’m an economist doing research? Am I reading a book about Europe? It’s going to be hard to pinpoint which of those contexts applies to me.”
It’s a step. And keeping data on-device is a major privacy win.
Still, there’s much still to figure out. And it’s unclear how well it will work compared to GAID (the smart money will bet on not nearly as well). In addition, how ad networks function with Topics and how mediation works with — or ignores — the technology remains to be defined. Google doesn’t necessarily have to force everyone to use its technology, but the eventual deprecation of the GAID will force the adtech ecosystem to look at this seriously as a means and methodology of connecting the right ad to the right person at the right time.
In other words, to successfully manage ad targeting.
We have two years before Google deprecates Android’s ad identifier, but if our experience on iOS means anything, those who prepare for the change before it happens will fare best.
One bit of good news for advertisers: the Google referrer will still be active. So, just like on the web, you’ll be able to get last-click identification of the source of a click, which is helpful data to add to your overall basket of marketing measurement information.