Fixing a $13B problem: How Singular is killing app install fraud
You probably saw the news that we released last week: deterministic Android app install validation. This, along with a number of other improvements we’ve recently made, is a massive industry breakthrough that is completely game-changing for many of our clients.
Some of them are now saving massive amounts of money:
“Singular’s updated Fraud Prevention suite is the most powerful mobile app install fraud prevention I’ve seen,” says Channy Lim, Head of BI Department at Com2uS, maker of the hit mobile game Summoners War. “This will save us literally hundreds of thousands of dollars every month, and lead us to make more effective marketing decisions.”
The news is exciting, but I wanted to dive a little deeper.
I would like to share a little more detail about how app install fraud works, the problems with existing methods of finding it, and what we doing differently at Singular.
How app install fraud works
One of the ways fraudsters steal billions of advertisers’ dollars annually is app install fraud. Or, to put it another way: fake installs.
App install fraud is a collection of fraud methods that create fake mobile users and app installs. As opposed to attribution manipulation fraud, which steals credit for existing legitimate app installs, app install fraudsters take matters into their own hands and create app installs out of thin air.
There are multiple ways to perform fake installs fraud, and naturally, some are better than others.
The simplest and most low-tech way is a device farm. You get a bunch of devices, click a lot of tracking links, install a lot of apps, then open them, delete them, and reset each device’s Advertising ID (Android) or IDFA (iOS). Rinse and repeat regularly, and you’re collecting ad dollars.
But there are far more complex and advanced ways to perform fake installs that generate a lot more money far quicker.
One of the other ways fraudsters scale up their device farm operation is to use emulators and bots instead of real devices and real human beings who use the devices. This can be done in the cloud, and potentially on multiple servers in multiple locations, to try to look authentic.
One of the most notable techniques leveraged by smarter fraudsters is SDK spoofing.
Mobile marketers place software (an SDK) from a Mobile Measurement Partner (MMP) in their apps to monitor and measure the results of their marketing. In SDK spoofing, no app is ever actually installed … but an install is being reported to the MMP and potentially other analytics providers by faking the SDK’s traffic. This can be done by technically advanced fraudsters who understand how communication with the measurement service works and how to emulate that communication.
This is far more scalable than running a device farm, because once they have done the initial work, they can create a script to run on servers around the globe. That creates fake installs on fake devices. Alternatively, they can write code that can run on legitimate users’ devices anywhere, reporting installations of apps that have never been installed: fake installs on real devices.
Another example comes in the form of malware, where malicious apps install and run legitimate apps on real users’ devices. This happened for example with the Viking Horde malware. In such cases the user is real and the app is real but the install itself is fraudulent.
As fraudsters become more advanced they tap more and more into the power of the high-tech fake install techniques, and for good reasons. These attacks are highly scalable and hard to find, therefore netting the fraudsters huge amounts of money.
Detecting and preventing fake installs is hard
There are multiple ways to detect fake installs. The problem is that many are unreliable, inaccurate, and most importantly, ineffective.
SDK Message Hashing
Since SDK spoofing aims to fake an MMP’s SDK traffic, MMPs (including Singular) protect each message sent from the SDK. That’s typically done via hashing: taking the data from the message, a secret key that is different for each app, and combining them to create a blob of data that can be verified on the MMP’s backend.
The problem is that the secret is not so secret, as apps that run on users’ devices can create these hashes, so SDK fraudsters can extract the secret and algorithm from the publicly available app binary. At times they don’t even need to reverse engineer the algorithm since the SDK is open source.
Abnormal numbers of new devices
One interesting statistical technique to fight fake install fraud is to look for a high percentage of brand-new or never-before-seen devices coming from specific ad networks or publishers. When you see abnormally high ratios, it’s generally clear that something fishy is happening.
The problem however, is that fraudsters sometimes leverage existing devices or mingle their fake traffic with traffic from real devices, making it harder to spot anomalies.
Abnormal retention rate or other KPIs
Marketers can sometimes identify fraud by seeing abnormal rates of retention, in-app purchases, or other KPIs. For example, if your average retention is 15% on D14, but installs from a particular campaign, publisher, or network show a 1% retention rate, it’s clear that there’s something that deserves further investigation.
But Singular research shows that fraudsters have learned to fake retention and post install events/purchases.
For example, Singular uncovered a case of extremely sophisticated SDK spoofing campaign on iOS that fools most fraud prevention solutions in the industry. The fraudsters not only generated seemingly legitimate app installs but they also continued to send post-install events, in essence faking real users’ activity. They have even tried reporting in-app purchases, and while doing so reported revenue receipts for these fake purchases.
Sensor data and user behavioral analysis
Sensor data based solutions take post-install fake user detection one step further. These solutions try to detect abnormal devices or users by looking at non-marketing data points such as device movements (via a smartphone’s accelerometer and/or gyroscope), battery data, and user-screen interaction.
Simple: sensor data for real devices should look different than simulators that don’t move.
The challenge is that this can be faked as well as shown in the huge “We Purchase Apps” scandal revealed in October 2018. In this massive ad fraud campaign, the perpetrators bought real apps, studied the usage patterns of their real users, and then created fake users coming from those same apps.
One of the biggest targets of this campaign was none other than Google itself, the company who has probably put the most effort into profiling real user activities and protecting advertisers from fake user emulation.
And more …
There are multiple other methods, each of which has its strengths and weaknesses.
The problem with post-install fraud determination
While post-install methods do an important job of raising the bar against fraud they have some inherent caveats that stop them from being effective fraud prevention tools.
1: Statistical (in)significance
Post-install methods are statistical tools that work by looking at groups of installs and checking if one or more of these groups exhibit anomalous activities. Usually these groups would be installs coming from the same publisher. For example, when looking for new devices it’s unsurprising to see a legitimate user with a new device, as new devices are constantly being sold to consumers.
However, for a publisher driving thousands of installs, seeing 95% of those installs from new devices should be highly suspicious. Fraudsters have figured out that they can’t be so blatant, and so they take action and hide. Some drive their traffic from many different publisher IDs and even networks to keep numbers low; some mix their fraudulent installs with legitimate installs to make the anomaly less apparent.
Utilizing such techniques allows fraudsters to avoid detection by making the anomalies statistically less significant, making it a lot harder to distinguish legitimates traffic from fake traffic and so making it harder to stop the fraudulent activities without incurring high false positives.
2) Post postback friction
As the name suggests, post install methods only come into effect after an install has happened, and might be processed days or weeks after the install. That also means that they are evaluated after an install postback is sent to the media source, which means after conversion and billing notification in CPI campaigns.
The result is that the media source will charge for the now-known-to-be fraudulent conversion … unless a process of reconciliation is done. This process is often manual, messy, and a cause of great friction between ad networks and advertisers.
3) Non-optimized optimization
Ad networks often perform real-time optimizations based on initial success analytics: evidence of conversions such as app installs. Now, however, those optimizations will be skewed by fraudulent activities.
In effect, having been rewarded by fraud, they will now optimize for MORE fraud.
As an example, if publisher A drives more installs than publisher B for some advertisers, the network might prefer to prioritize publisher A over publisher B and send more ads its way. Now imagine publisher A is actually driving fake installs which are not prevented in real time (as happens in post-install detection). The network will funnel more budget to A over B.
Even if those fraudulent installs are detected post-install and reimbursed, the damage has already been done and goals will not be met because of the optimization changes and budget shift.
Singular’s solution: deterministic pre-attribution fraud decisions
Singular strives to have no false positives. We want to clearly identify fraud at a granular level. So Singular’s fraud results apply to actual individual installs, devices, and users, not blanket-level sources or publishers (although we can – and do – block those too).
We also want to find fraud as conversions or installs happen.
Anything less will suffer from the problems outlined above.
When we took time out earlier this year to consider everything, it was clear that we needed a different approach here. We needed something that would work in real time — install-time — and have an extremely low false-positive rate while still maintaining effectiveness.
To meet these requirements, we decided to disregard everything we thought we knew about ad fraud and look for something new. As we reported publicly last week, after an exhaustive search we found what we believed would be a high-quality deterministic fake install detection method that works at install time.
The new method we discovered depends on signals from the install device that allow us to verify that a user exists, they truly installed the app from the store, and they haven’t installed the app an unreasonable number of times (sorry-not-sorry, fraudsters who “install” an app on a phone hundreds or thousands of times).
Of course, once we found this method, we knew we needed to validate that it works as expected at scale, in the real world, on thousands of ad networks. To do so we tested with some of the most successful mobile publishers on the planet. And we validated our results against post-install metrics.
The actual implementation of our new fraud prevention method proved to have a tremendous effect on some of our customers, eliminating their fake install problem. (Find more about it in our report.)
In a later blog post we will share some more details about our findings, but it’s safe to say that we were blown away by the scale of the fraudulent activity we’ve found, and as more and more customers utilize the feature, the numbers are only going to grow.