Topics API: How Privacy Sandbox on Android connects people with ads
It will not surprise anyone in the mobile apps industry that one of the core components of advertising technology is figuring out how to connect people and ads. Fraudsters and low-quality ad networks don’t care who they show an ad to — or even that a real person sees an ad — but ad partners worthy of the name expend a huge amount of effort and painstaking care in building technology for high quality matching between advertisers and potential users or customers.
In Privacy Sandbox for Android, Google’s proposing to take on that burden with the Topics API.
This post is part of a series on Privacy Sandbox:
- SDK Runtime
- Topics (this post, on how Google sees ad targeting working)
- FLEDGE on Android (coming soon: how Privacy Sandbox will do ad retargeting)
- Attribution Reporting (coming soon: how Google is proposing ad measurement will work)
There are multiple ways to target people with ads, but many of them have privacy downsides.
|Targeting methodology||Effectiveness?||Technology requirements||Privacy safe?|
Can be dangerous over time
|Demographics||Medium||Requires third-party data
Requires identity resolution
|Context||Low||Requires knowledge of context||Yes|
|Interests||Low||Requires knowledge of interests||Potentially|
|Intent||High||Requires knowledge of searches||Potentially|
Every targeting mechanism is potentially privacy-safe, but each has risks, and some modes have higher risks.
For example, Apple’s SKAdNetwork is deterministic: each install postback is a 100% guarantee that someone has installed your app. However, despite that, because it’s not granular per user and because it obscures some of your data via privacy thresholds, it’s low-risk for privacy violations.
Digital behavior like website visits or app usage, generally determined via trackers like third-party cookies or advertising identifiers (IDFA, GAID), is fairly invasive. But it could theoretically be made privacy safe through additional features like differential privacy, audiences, noise addition, and so on. (The question would be: could a high-volume advertiser break the code, get enough data, demystify the data, and identify people specifically? Or, do you trust whoever is creating the audiences?)
Context is generally privacy safe: an ad relates to the content on the website or in the app that surrounds the ad, not to the person.
Interests and Topics API
Interests, which Google is leveraging in Topics API for the Android privacy sandbox, are interesting.
(I know that sounds stupid. But the fact remains: they are.)
Topics are more than context. Context is good, but it’s limited to a specific page or screen. Someone could have clicked the wrong link, followed clickbait, or tapped on the wrong screen in an app. Alternatively, it could be the right screen, but they’re only on it to get a needed piece of data, not because they are deeply and passionately interested in whatever that context is.
Interest, on the other hand, is fairly global and persistent over time.
Example: I like sports. That won’t change from week to week or month to month, though the exactly sports I like are likely to evolve over longer periods of time. While Google’s building Topics API for fairly short-term interests, it’s still longer-term than immediate context.
Google’s privacy sandbox defines a topic as a human-readable area of interest that someone demonstrates engagement with in the recent past. In fact, likely within just the last three weeks.
One challenge: Google’s current vision for topics is that there is a very limited taxonomy of them: between a few hundred and a few thousand topics, which Google will share to the marketing community at a later date. They will be human-curated so they won’t include sensitive topics, and are specifically intended to not be very granular:
“The Topics API intends to provide callers with coarse-grained advertising topics of interest based on the user’s app usage,” Google says.
Topics are based on a user’s recent ad usage and app installs, but if a user deletes an app, any topics associated with that app won’t be removed from their list of topics, “in order to avoid disclosing information about the uninstallation,” Google says.
While the list of topics right now is fairly limited, this could be expanded over time, Google says. Clearly, Google wants to avoid too-tight targeting based on topics that adtech SDKs save and remember over time, associating with a user, in order to develop a very detailed profile of a person that could be used in privacy-threatening ways.
Interestingly, Google says that potential topics for a user are defined by a classifier model for apps.
In other words, apps feed the taxonomy for users, and the data that trains the model uses publicly available information like app name, description, and package name. Apps can map to multiple topics, or none, but there is a limit: no matter how many they have, only 3 will be added to the user’s topic history in any given week.
There is a degree of user control, however: “The design intends to provide users with the ability to view and remove the topics that are associated with their app usage,” Google says. There’s no mention of people adding topics manually, which could be interesting. I can see challenges with that, like people adding totally spurious data, but it could be useful nevertheless: I could define the topics that I want to see ads about, and Android’s privacy sandbox could ensure I only (or mostly) see ads about those topics.
(And perhaps never, for another defined list.)
Targeting ads with Topics: how it works
Every week, Google computes a user’s top 5 topics, which stay on-device. Google’s technology is figuring this out based on recent apps and app usage, but the information stays on-device and Google doesn’t know it.
When an app that monetizes via ads wants to fill an ad slot, the adtech SDK in the app calls the Topics API and checks if there’s a topic in the user’s list that matches a topic assigned to the app. Note: Google specifically prohibits SDKs or apps from storing that data and using it over time to build up a more detailed picture of the user’s interests and topics.
- Once a week, Android calculates a user’s top 5 topics
- Calling Topics API successfully will get your a topic randomly chosen from that list, 95% of the time
- 5% of the time, you’ll get a different topic, randomly chosen from the full taxonomy
- When you successfully call Topics API, you’ll get a maximum of 3 topics, 1 for each of the previous 3 weeks
- This means Android must store at least 15 top topics, many of which are likely to be duplicates
- Apps get different topics, thanks to the random pick and noise introduced, to ensure that app A and app B, which might be owned by the same publisher, can’t triangulate on individual users
- Successfully calling the Topics API, however, requires that apps or SDKs must have observed engagement with that topic within the past 3 weeks.
- It’s not 100% clear, but how I read this it that your general topic app with high usage but low-value ad slots can’t call a topic like “injury lawyer” or “auto insurance price quote” to try to capture high-value ad impressions
- In addition, Google says that if the app or SDK “ did not call the API in the past for that user on an app about that topic, then the topic will not be included in the list returned by the API”
- So, while it’s not entirely clear, since either the app or SDK can view that kind of engagement, there may be a scale advantage built into Topics in Privacy Sandbox for Android, simply due to the fact that larger, more-distributed SDKs will see more
- That scale advantage will be critical, too, because it’s not just lifetime visibility of topics we’re talking about: it has to be in the last three weeks
Note that apps can completely opt out of Topics API via their manifest and XML elements. In this case, apps don’t contribute to the weekly topic computation, Google says. Some larger apps that don’t see value in Topics or want to share any information about themselves to third-party ad SDKs, and some smaller apps that are extremely privacy-sensitive might therefore opt out entirely.
Current challenges with Topics
There are a number of potential problems with topics. Note that this is early and Google is seeking feedback, so these may all be resolved over time.
- Scale: if Google can’t convince enough publishers and adtech companies to opt in, Topics API won’t have enough data to produce meaningful results.
- Visibility: as Google has defined Topics API, you won’t see topics returned that you haven’t observed in the past. This seriously limits targeting, seemingly.
- Call me twice to get me once: as an advertising app or SDK, you can only see what you have previously called: Google says that a topic is visible to an app or ad SDK in an app “if the caller made a Topics API request from an app associated with this topic.”
- Granularity of topics: with only hundreds to a few thousand topics, it’s hard to get granular. Sports is a big topic, but I may like 7 or 10 different sports, and I may follow 15 or 20 teams, and like 30 or 40 players.
- Limited number of topics for apps: apps will only be able to add 3 topics to a user’s topic history in a given period of time. That’s really limiting for a big app like the NY Times, which deals with literally hundreds if not thousands of topics.
- No location info: Topics API does not currently seem to have any knowledge or provision for location information, which can be a critical component for targeting and can impact topic taxonomy significantly as well.
- Gaps when there’s no match: what happens if there’s no correlation between topics assigned to an app and topics that a user has generated? What ads do people see then?
Interest, Google’s next steps, and chiming in
There are some limitations to interest as a targeting factor, of course. Just because I like sports doesn’t mean I want to buy a jersey from my favorite team. Interest is also general and long-term, versus actionable behavior driven by things like searches, which show higher intent and are likely therefore higher value.
Also, interests are hard to correlate to purchase intent. As Shamanth Rao at Rocketship HQ says:
“The 5 ‘topics’ an SDK is accessing about a user are a very small sliver of a user’s persona. If the topic assigned to me is Europe, is that because I’m an American wanting to visit the continent? Because I’m an economist doing research? Am I reading a book about Europe? It’s going to be hard to pinpoint which of those contexts applies to me.”
It’s a step. And keeping data on-device is a major privacy win.
Still, there’s much still to figure out. And it’s unclear how well it will work compared to GAID (the smart money will bet on not as well). In addition, how ad networks function with Topics and how mediation works with — or ignores — the technology remains to be defined. Google doesn’t necessarily have to force everyone to use its technology, but the eventual deprecation of the GAID will force the adtech ecosystem to look at this seriously as a means and methodology of connecting the right ad to the right person at the right time.
In other words, to successfully manage ad targeting.
We have two years before Google deprecates Android’s ad identifier, but if our experience on iOS means anything, those who prepare for the change before it happens will fare best.
For more on Privacy Sandbox for Android, see our livestream about the introduction of the new technology, including a deep dive on attribution.