October 18, 2018
Using attribution data to calculate mobile ads LTV

Eric Benjamin Seufert is the owner of Mobile Dev Memo, a popular mobile advertising trade blog. He also runs Platform and Publishing efforts at N3TWORK, a mobile gaming company based in San Francisco, and published Freemium Economics, a book about the freemium business model. You can follow Eric on Twitter.

Note: if you’re looking for ad monetization with perhaps less effort than Eric’s method below, talk to your Singular customer service representative (and stay tuned for additional announcements).

Various macro market forces have aligned over the past two years to create the commercial opportunity for app developers to generate significant revenue from in-app advertising. New genres like hypercasual games and even legacy gaming genres and non-gaming genres have created large businesses out of serving rich media video and playable ads to their users by building deep, sophisticated monetization loops that enrich the user experience and produce far less usability friction than some in-app purchases.

But unfortunately, while talented, analytical product designers are able to increase ad revenues with in-game data by deconstructing player behavior and optimizing the placement of ads, user acquisition managers have less data at their disposal in optimizing the acquisition funnel for this type of monetization. Building an acquisition pipeline around in-app ads monetization is challenging because many of the inputs needed to create an LTV model for in-app ads are unavailable or obfuscated. This is evidenced in the fact that a Google search for “mobile app LTV model” yields hundreds of results across a broad range of statistical rigor, but a search for “mobile app ads LTV model” yields almost nothing helpful.

Why is mobile ads LTV so difficult to calculate?

For one, the immediate revenue impact of an ad click within an app isn’t knowable on the part of the developer and is largely outside of their control. Developers get eCPM data from their ad network partners on a monthly basis when they are paid by them, but they can’t really know what any given click is worth because of the way eCPMs are derived (ad networks usually get paid for app installs, not for impressions, so eCPM is a synthetic metric).

Secondly, app developers can’t track ad clicks within their apps, only impressions. So while a developer might understand which users see the most ads in their app and can aggregate that data into average ad views per day (potentially split by source), since most ad revenue is driven by the subsequent installs that happen after a user clicks on an ad, ad view counts alone don’t help to contribute to an understanding of ads LTV.

Thirdly, for most developers, to borrow conceptually from IAP monetization, there are multiple “stores” from which ad viewing (and hopefully, clicking) users can “purchase” from: each of the networks that an app developer is running ads from, versus the single App Store or Google Play Store from which the developer gathers information. So not only is it more onerous to consolidate revenue data for ads, it also further muddies the monetization waters because even if CPMs for various networks can be cast forward to impute revenue, there’s no certainty around what the impression makeup will look like in an app in a given country on a go-forward basis (in other words: just because Network X served 50% of my ads in the US this month, I have no idea if it will serve 50% of my ads in the US next month).

For digging into problems that contain multiple unknown, variable inputs, I often start from the standpoint of: If I knew everything, how would I solve this? For building an ads LTV model, a very broad, conceptual calculation might look like:

What this means is: for a given user who was acquired via Channel A, is using Platform B, and lives in Geography C, the lifetime ad revenue they are expected to generate is the sum of the Monthly Ad Views we estimate for users of that profile (eg. Channel A, Platform B, Geography C) times the monthly blended CPM of ad impressions served to users of that profile.

In this equation, using user attribution data of the form that Singular provides alongside internal behavioral data, we can come up with Lifetime Ad Views broken down by acquisition channel, platform, and geography pretty easily: this is more or less a simple dimensionalized cumulative ad views curve over time that’d be derived in the same way as a cumulative IAP revenue curve.

But the Blended CPM component of this equation is very messy. This is because:

  • Ad networks don’t communicate CPMs by user, only at the geo level; [Editorial note: there is some significant change happening here; we will keep you posted on new developments.]
  • Most developers run many networks in their mediation mix, and that mix changes month-over-month;
  • Impression, click, and video completion counts can be calculated at the user level via mediation services like Tapdaq and ironSource, but as of now those counts don’t come with revenue data.

Note that in the medium-term future, many of the above issues with data availability and transparency will be ameliorated by in-app header bidding (for a good read on that topic, see this article by Dom Bracher of Tapdaq). In the meantime, there are some steps we can take to back into reasonable estimates of blended CPMs for the level of granularity that our attribution data gives us and which is valuable for the purposes of user acquisition (read: provides an LTV that can be bid against on user acquisition channels).

But until that manifests, user acquisition managers are left with some gaps in the data they can use to construct ads LTV estimates. The first glaring gap is the network composition of the impression pool: assuming a diverse mediation pool, there’s no way to know which networks will be filling what percentage of overall impressions in the next month. And the second is the CPMs that will be achieved across those networks on a forward-looking basis, since that’s almost entirely dependent on whether users install apps from the ads they view.

The only way to get around these two gaps is to lean on historical data as a hint at what the future will look like (which violates a key rule of value investing but is nonetheless helpful in forming a view of what’s to come). In this case, we want to look at past CPM performance and past network impression composition for guidance on what to expect on any given future month.

Estimating mobile ads LTV in Python

To showcase how to do that, we can build a simple script in python, starting with the generation of some random sample data. This data considers an app that is only serving ads to users from Facebook, Unity, and Vungle in the US, Canada, and UK:

import pandas as pd
import matplotlib
import numpy as np
from itertools import product
import random

geos = [ 'US', 'CA', 'UK' ]
platforms = [ 'iOS', 'Android' ]
networks = [ 'Facebook', 'Unity', 'Applovin' ]

def create_historical_ad_network_data( geos, networks ):
 history = pd.DataFrame(list(product(geos, platforms, networks)),
 columns=[ 'geo', 'platform', 'network' ])

 for i in range( 1, 4 ):
 history[ 'cpm-' + str( i ) ] = np.random.randint( 1, 10, size=len( history ) )
 history[ 'imp-' + str( i ) ] = np.random.randint( 100, 1000, size=len( history ) )
 history[ 'imp-share-' + str( i ) ] = history[ 'imp-' + str( i ) ] / history[ 'imp-' + str( i ) ].sum()

 return history

history = create_historical_data(geos, networks)

Running this code generates a Pandas DataFrame that looks something like this (your numbers will vary as they’re randomly generated):

[code / table]
geo platform network cpm-1 imp-1 imp-share-1 cpm-2 imp-2 \
0 US iOS Facebook 2 729 0.070374 9 549 
1 US iOS Unity 7 914 0.088232 3 203 
2 US iOS Applovin 7 826 0.079737 4 100 
3 US Android Facebook 2 271 0.026161 2 128 
4 US Android Unity 5 121 0.011681 9 240 
5 US Android Applovin 6 922 0.089005 9 784 
6 CA iOS Facebook 2 831 0.080220 9 889 
7 CA iOS Unity 8 483 0.046626 5 876 
8 CA iOS Applovin 7 236 0.022782 9 642 
9 CA Android Facebook 8 486 0.046916 4 523 
10 CA Android Unity 1 371 0.035814 5 639 
11 CA Android Applovin 8 588 0.056762 7 339 
12 UK iOS Facebook 2 850 0.082054 8 680 
13 UK iOS Unity 7 409 0.039483 3 310 
14 UK iOS Applovin 1 291 0.028092 5 471 
15 UK Android Facebook 7 370 0.035718 6 381 
16 UK Android Unity 3 707 0.068250 6 117 
17 UK Android Applovin 3 954 0.092094 3 581

imp-share-2 cpm-3 imp-3 imp-share-3 
0 0.064955 8 980 0.104433 
1 0.024018 4 417 0.044437 
2 0.011832 3 157 0.016731 
3 0.015144 7 686 0.073103 
4 0.028396 3 550 0.058610 
5 0.092759 8 103 0.010976 
6 0.105182 1 539 0.057438 
7 0.103644 6 679 0.072357 
8 0.075958 5 883 0.094096 
9 0.061879 1 212 0.022592 
10 0.075603 8 775 0.082587 
11 0.040109 6 378 0.040281 
12 0.080454 6 622 0.066283 
13 0.036678 8 402 0.042839 
14 0.055726 7 182 0.019395 
15 0.045078 2 623 0.066390 
16 0.013843 2 842 0.089727 
17 0.068741 1 354 0.037724

One thing to consider at this point is that we have to assume, on a month-to-month basis, that any user in any given country will be exposed to the same network composition as any other user on the same platform (that is, the ratio of Applovin ads being served to users in the US on iOS is the same for all users of an app in a given month). This almost certainly isn’t strictly true, as, for any given impression, the type of device a user is on (eg. iPhone XS Max vs. iPhone 6) and other user-specific information will influence which network fills an impression. But in general, this assumption is probably safe enough to employ in the model.

Another thing to point out is that retention is captured in the Monthly Ad Views estimate that is tied to source channel. One common confusion in building an Ads LTV model is that there are ad networks involved in both sides of the funnel: the network a user is acquired from and the network a user monetizes with via ads served in the app. In the construction of our model, we capture “user quality” in the Monthly Ad Views component from Part A, which encompasses retention in the same way that a traditional IAP-based LTV curve does. So there’s no reason to include “user quality” in the Part B of the equation, since it’s already used to inform Part A.

Given this, the next step in approximating Part B is to get a historical share of each network, aggregated at the level of the Geo and Platform. Once we have this, we can generate a blended CPM value at the level of Geo and Platform to multiply against the formulation in Part A (again, since we assume all users see the same network blend of ads, we don’t have to further aggregate the network share by the user’s source network).

In the below code, the trailing three-month impressions are calculated as a share of the total at the level of Geo and Platform. Then, each network’s CPM is averaged over the trailing three months and the sumproduct is returned:

history[ 'trailing-3-month-imp' ] = history[ 'imp-1' ] + history[ 'imp-2' ] + history[ 'imp-3' ]

history[ 'trailing-3-month-imp-share' ] = history[ 'trailing-3-month-imp' ] / history.groupby( [ 'geo', 'platform' ] )[ 'trailing-3-month-imp' ].transform( sum )

history[ 'trailing-3-month-cpm' ] = history[ [ 'cpm-1', 'cpm-2', 'cpm-3' ] ].mean( axis=1 )

blended_cpms = ( history[ [ 'trailing-3-month-imp-share', 'trailing-3-month-cpm' ] ].prod( axis=1 )
 .groupby( [ history[ 'geo' ], history[ 'platform' ] ] ).sum( ).reset_index( )

blended_cpms.rename( columns = { blended_cpms.columns[ len( blended_cpms.columns ) - 1 ]: 'CPM' }, inplace = True )

print( blended_cpms )

Running this snippet of code should output a DataFrame that looks something like this (again, the numbers will be different):

geo platform CPM
0 CA Android 5.406508
1 CA iOS 4.883667
2 UK Android 4.590680
3 UK iOS 5.265561
4 US Android 4.289083
5 US iOS 4.103224

So now what do we have? We have a matrix of blended CPMs broken out at the level of Geo and Platform (eg. the CPM that Unity Ads provides for US, iOS users) — this is Part B from the equation above. The Part A from that equation — which is the average number of ad views in a given month that we expect from users that match various profile characteristics pertaining to their source channel, geography, and platform — would have been taken from internal attribution data mixed with internal app data, but we can generate some random data to match what it might look like with this function:

def create_historical_one_month_ad_views( geos, networks ):
 ad_views = pd.DataFrame( list( product( geos, platforms, networks ) ), 
 columns=[ 'geo', 'platform', 'source_channel' ] )
 ad_views[ 'ad_views' ] = np.random.randint( 50, 500, size=len( ad_views ) )
 return ad_views

month_1_ad_views = create_historical_one_month_ad_views( geos, networks )
print( month_1_ad_views )

Running the above snippet should output something like the following:

geo platform source_channel ad_views
0 US iOS Facebook 73
1 US iOS Unity 463
2 US iOS Applovin 52
3 US Android Facebook 60
4 US Android Unity 442
5 US Android Applovin 349
6 CA iOS Facebook 279
7 CA iOS Unity 478
8 CA iOS Applovin 77
9 CA Android Facebook 479
10 CA Android Unity 120
11 CA Android Applovin 417
12 UK iOS Facebook 243
13 UK iOS Unity 306
14 UK iOS Applovin 52
15 UK Android Facebook 243
16 UK Android Unity 106
17 UK Android Applovin 195

We can now match the performance data from our user base (gleaned using attribution data) with our projected CPM data to get an estimate of ad revenue for the given month with this code:

combined = pd.merge( month_1_ad_views, blended_cpms, on=[ 'geo', 'platform' ] )
combined[ 'month_1_ARPU' ] = combined[ 'CPM' ] * ( combined[ 'ad_views' ] / 1000 )

print( combined )

Running the above snippet should output something like the following:

geo platform source_channel ad_views CPM month_1_ARPU
0 US iOS Facebook 73 5.832458 0.425769
1 US iOS Unity 463 5.832458 2.700428
2 US iOS Applovin 52 5.832458 0.303288
3 US Android Facebook 60 5.327445 0.319647
4 US Android Unity 442 5.327445 2.354731
5 US Android Applovin 349 5.327445 1.859278
6 CA iOS Facebook 279 6.547197 1.826668
7 CA iOS Unity 478 6.547197 3.129560
8 CA iOS Applovin 77 6.547197 0.504134
9 CA Android Facebook 479 4.108413 1.967930
10 CA Android Unity 120 4.108413 0.493010
11 CA Android Applovin 417 4.108413 1.713208
12 UK iOS Facebook 243 4.626163 1.124158
13 UK iOS Unity 306 4.626163 1.415606
14 UK iOS Applovin 52 4.626163 0.240560
15 UK Android Facebook 243 5.584462 1.357024
16 UK Android Unity 106 5.584462 0.591953
17 UK Android Applovin 195 5.584462 1.088970

That last column — month_1_ARPU — is the amount of ad revenue you might expect from users in their first month, matched to their source channel, their geography, and their platform. In other words, it is their 30-day LTV.

Putting it all together

Hopefully this article has showcased the fact that, while it’s messy and somewhat convoluted, there does exist a reasonable approach to estimating ads LTV using attribution and ads performance data. Taking this approach further, one might string together more months of ad view performance data to extend the limit of the Ads LTV estimate (to month two, three, four, etc.) and then use historical CPM fluctuations to get a more realistic estimate of where CPMs will be on any given point in the future (for example, using a historical blended average doesn’t make sense in the run-up to Christmas, when CPMs spike).

The opportunities and possibilities for making money via rich ads at this point of the mobile cycle are exciting, but they don’t come without new challenges. In general, with the way the mobile advertising ecosystem is progressing towards algorithm-driven and programmatic campaign management, user acquisition teams need to empower themselves with analytical creativity to find novel ways to scale their apps profitably.

. . .

. . .

Next: Get the full No-BS Guide to Mobile Attribution, for free, today.