Blog

Marketing ETL: why marketers need to give a damn about authentic ETL

By John Koetsier February 17, 2023

I get it. I really, really get it. You want insights, you want actionable data, you want next best action, right now. You don’t really care about marketing ETL, or frankly, about any particular how. You know the why (growth) and you need the what (increase this, decrease that, change something else), but the mechanics of how is sort of someone else’s problem.

Engineers.

Analysts.

Data scientists (who sometimes feel like data janitors), and all the other truly technical people on whom we rely. 

The purpose of this post is simple: to convince you that the how matters, that continuous marketing ETL is the how you want, and that choosing the right how is immensely, massively, hugely in your favor.

As in: makes your life better. Gets you the atta-boys. Saves your company money. Provides better-fresher-cleaner-more-accurate data. Saves you from mistakes. Spares you embarrassment. Wins you the promotion. And — probably even more importantly — makes your developers and data scientists not hate you. (This is good because it sucks to be hated, but it’s also good because people help people they don’t hate. And you need their help. You really, really do.)

Marketing ETL: 30 seconds on why it matters

As much as you’ve tried to avoid learning it, you sort of know by osmosis that ETL stands for Extract, Transform, and Load. It’s about taking data out of one place, doing something unsexy but apparently tremendously important to it, and putting it in another place.

Kinda simple, right?

Sure. 

To get the best possible marketing results, you need a lot of data. Specifically, you need:

  1. Input data: costs, campaigns
  2. Delivery data: impressions, clicks, engagement
  3. Conversion data: installs, conversions, and more
    1. User level where legal/ethical/available
    2. Aggregated data from multiple sources

That means you’re engaging with many different systems, including your own analytics and live ops solutions, SKAdNetwork, Privacy Sandbox on Android in the future, but especially ad and marketing partners: the ad networks you are spending money on. 

And that’s not as easy as it might sound.

Why?

Here’s just one example that marketing ETL systems have to deal with: the tables and fields associated with Facebook Ads. In pointy-head language, it’s an ERD, an Entity Relationship Diagram. You don’t have to know a lot about data to see that there’s a lot of entities here with a lot of relationships (thanks FiveTran for the image):

marketing ETL

Guess what?

That’s just one of your dozens of marketing partners. There’s also Google. There’s Twitter, TikTok, Snap, ironSource, Liftoff, Moloco, AppLovin, and thousands of other potential ad partners you might want to try. However, it’s not just about the number of marketing partners. In many cases, the schemas aren’t well-defined or documented. In others, you can’t even access them without special permission.

Getting all this data in your BI systems is hard. If you want all the raw data you can ingest to generate all the best insights on what’s working (or not), you need to centralize it, make it consumable, make it actionable, make it accessible.

Obviously, getting this data from a marketing analytics provider like Singular is much preferable to getting it all on your own. Because Singular has done all the work and made it easy. Because you need all of this even if you just want to fully test a new partner.

So ETL is the answer?

Sure … but not just any ETL. Because marketing data is not just any data.

Marketing data isn’t like other data, and traditional solutions won’t cut it. Here’s why …

Marketing data is hard. Really hard.

To get all the data you really need into your BI system via APIs or exports — even from Singular — your developers and data scientists must:

  • Build pipelines that use multiple endpoints to get aggregated data
  • Generate an endpoint and ingestion protocol for user-level data

Most marketing analytics providers, including mobile measurement partners, support daily exports, and most MMPs also make data available via real-time postbacks.

Exports are easiest, but if you choose exports, there’s always a chance they could be late or partial, because ad partners have issues (everyone has issues). If you choose postbacks, you have to ingest them. Postbacks are good because they are real-time, but they’re bad because if you have an issue (repeat: everyone has issues) and your systems hiccup and your endpoint goes down, you lose data and it never comes back. Oh, and by the way, building real-time solutions is expensive and hard. You need massive scale or the ability to dynamically scale, because one day nothing is happening and the next day Apple features you, your big paid campaign hits, and a TikTok influencer hypes you up.

But your systems have infinite scale and never go down, right?

Since developers and data scientists hate losing data — and so do you, because your job performance depends on accurate, timely, and complete data — they end up writing code for both exports and postbacks.

That sucks because it’s extra work.

For exports, you’ve got some manual set-up to start, after which you hope that it’s set-and-forget. But you’re still on the hook for loading the data, transforming it, and ensuring reliability. 

And just to make APIs work, you have to:

  • Set up automated processes
  • Access data points
  • Download data
  • Load data
  • Transform data into your preferred formats
  • Save it locally
API vs exports vs ETL

Here’s the bad news: all of that is the easy part. 

Postbacks are harder, not only because they require real-time processing and 100% uptime, but also because there are so many new metrics and new APIs all the time, especially recently thanks to SKAdNetwork and — now in public beta — Privacy Sandbox on Android.

Every time something changes, data breaks for you as a marketer. Your BI breaks. Your models break. Your decision-making capability breaks. And now you have to go back to your developers and data scientists, get their attention, take them off whatever they’re working on right now, and beg them to change API calls, adjust ingestion code, rewrite transformation code.

Sorry to be the bearer of (even more) bad news, but one of the key reasons why marketing data is not like other data is that each adtech vendor’s schema is extremely prone to change:

  • New versions
  • New hierarchies
  • New metrics
  • New ad types

You know that mobile and adtech are two of the fastest-changing spaces. Put them together, and the impact is exponential. In a very real sense, you’re not just navigating a data maze, you’re a real-life Maze Runner navigating a moving puzzle. And making changes to a system based on APIs/exports/postbacks requires significant ongoing changes to code, new ways to JOIN data, and more developer time.

After you’ve done all this work and your data scientists and developers have quit in disgust but you’re happy because you have the data you need, you start to wonder: should I use the aggregated data in the exports, or the user-level data in the postbacks? Or should I combine them both in some fancy way to shine a light on the darker corners of my apps’ performance? 

(And guess what: there’s no easy or straightforward way to make this happen: no guarantees that your real-time data connects easily with your export data. For example, joining SKAdNetwork data with marketing campaign data is non-trivial at best, extremely hard at worst.)

At this point, you may start questioning your life choices along with your data team.

And … we haven’t even talked about cost data yet, or campaign delivery data, or in-app events and engagements, all of which you really do need along with your aggregated clicks and conversion data and whatever user-level data is available from your partners.

So … what you really need is a fully managed marketing ETL which understands all the schemas from all your partners out of the box. One in which you can simply select the new fields, alter the destination table, and — hit the Easy button — you’re done. This hugely simplifies both your initial set-up tasks and massively reduces your on-going maintenance costs.

But you need to know: traditional ETL simply won’t cut it.

Because there’s another reason marketing data isn’t like other data …

But wait, it gets better (or, why you need marketing ETL)

All this time and for all this massive (and ongoing) scope of work, you’ve probably been making a completely understandable but also completely devastating mistake: assuming the data your API calls and exports are getting from your 37 different ad partners is correct.

Oops.

Much of the time, the data is not correct.

That’s not really your ad partners’ fault, and actually, not even really their problem. It’s a function of the real world.

If you build an API for getting data and are managing it, you will need to pick a time when you’ll trigger a call for the data. Not shockingly, you’d like to pick a time just before you will use it, because you want the freshest and most complete data. That all makes sense on your side, but it bears no relationship whatsoever to your partners’ plans, needs, technologies, and timetables. 

There are many variables involved:

  • When the underlying data is populated
  • When each partner says August 31st turns into September 1st (is it at 9 PM your local time, 3 PM, or 1 AM for you?)
  • Is it US or U.S. or USA?
  • Is “adset” the same as “adgroup” … or not?
  • When Singular has finished pulling the data
  • When Singular has finished processing the data
  • When Singular has finished enriching the data

And even that is just the beginning. It’s also possible that:

  • There’s an error in the partner data
  • There’s fraud penalties and/or rebates that will eventually change the data
  • There are data delays from one or more regions which will be updated tomorrow
  • There are additional fees for services after the fact from one or more of your partners

Ad networks are going to continuously update their data based on their ongoing best sense of what reality is, and that means fraudulent publishers might be kicked out of a supply side platform, impacting clicks and conversions for days or weeks in the past. Or bugs might be discovered, impacting data reliability for weeks or months into the past. (That’s never happened before, right?) If they change something retroactively, you’ve got a problem. The data that you pulled was up-to-date and presumed accurate at the time you pulled it, but now reality looks different. 

Not shockingly, coordination and remediation of all this is a massive and messy ongoing process and challenge.

Why you need continuous ETL, not just ETL

There is, of course, a way to fix this mess, avoid the hassle, and experience a degree of sanity and zen calm in your working life. (A degree. Don’t get greedy.) A way of getting all the data you need, ensuring it’s usable, and ensuring it gets where it’s needed.

Getting the data you need from a marketing analytics provider is good and important. Getting it via ETL is good and important. 

But basic ETL and traditional ETL — even from a marketing analytics vendor or MMP — is insufficient. 

What you really need is continuous ETL built specifically for marketing professionals, built specifically with adtech vendors in mind.

There are 6 “alls” here that matter. 

ETL from Singular gets:

  1. All your user-level data
  2. All your aggregated data
  3. All your cost data
  4. From all your marketing partners
  5. Plus all your in-app conversion and engagement insights for enrichment …
  6. And — very importantly — all updates to historical data

Singular’s marketing ETL then effortlessly deposits all of this into your BI systems, ready for you to use. But Singular ETL doesn’t just load all of this data as if it’s an old-school CSV export: get all new data, enrich all new data, load all new data.

Marketing data pipeline tasksSingular Marketing ETLLegacy ETL
Get the data
Implement all the APIs
Ingest all the data (even by scraping if needed)
Normalize the data
Normalize and standardize all the data from all the ad networks, including
Aggregate data
User-level data
SKAN data
And much more …
Manage timing
When to pull the data from each source
When to push data to desired destinations
Enrich and correct
Monitor, correct, and enrich data from all your adtech sources in near real time
Update retroactive data
Ensure changed data from up to 30 days in the past gets updated seamlessly and comprehensively.

Singular’s continuous ETL keeps updating older data too. This is critically important: if something in your costs or conversions changes from 3 days ago or 3 weeks ago, that change is reflected in your most-up-to-date data. Even when you’re regularly getting data throughout the day, continuous ETL ensures you’re always getting the most up-to-date data for today and yesterday (and up to 30 days into the past). Even better, Singular is also meticulous about ETL visibility: communicating the status of the ETL data. Which means that you always know how up-to-date your data is, whether there are any issues, if there’s missing components, and so on.

Singular takes care of this automatically.

In other words: it just works.

Even if — due to bugs or changes to marketing event definitions — historical data needs to change, continuous ETL can re-propagate that through your systems. This is a huge time-saver and headache eraser. It is a load of your team’s mind, and therefore a load off yours.

If you don’t have continuous ETL, you literally will have bad data. 

Your “single source of truth” isn’t actually true. And the decisions you make based on that bad data will be less optimal than they could be if you knew the truth. Meaning, of course, you’ll waste money. You’ll overspend or underspend. You’ll misallocate budget between partners.

Ultimately, you won’t act as intelligently as you probably should have.

Marketing ETL from Singular will literally save your team

So … we’ve established that if you want all your marketing data in your marketing BI, you need ETL. And that you specifically need marketing ETL. And you really need continuous ETL to keep everything up to date.

Getting this will literally save your team.

And maybe you too.

Remember that Entity Relationship Diagram showing the tables and fields in the Facebook Ads architecture? Here’s another one for Apple Search Ads.

marketing ETL - ASA

If you get an off-the-shelf ETL that is not a marketing ETL specifically, you have to think about each of these ERDs from each of your partners.

(You may think: actually, not me: I’m a marketer. But someone on your team does. Someone in development and data science. And every moment they spend implementing this — and updating it when, inevitably, a partner changes a few fields — is a moment they are not spending helping you build a better app or improve your growth insights and tactics.)

And they’ll still come to you, because they’re engineers or data scientists, and they probably don’t know what each field in Snap or Twitter or Moloco data means, and how they correlate with each other, so they’ll ask you. This is incredibly specialized knowledge, different for each vendor, and only understood by a few who combine a performance marketer’s knowledge with developer’s mindset. Remember: most ETL products that are not specifically marketing ETL products are off the shelf. Pret-a-porter. One-size-fits-all. They support only a few sources: maybe 200 to 300, with only about 50 of those in martech and adtech. (Singular supports literally thousands.) They don’t normalize across what Meta and Google and Snap and TikTok and LinkedIn report so you can compare apples to apples. They don’t standardize naming and tables and fields so that everything fits and just makes sense.

All the burden of getting it right is on you. 

(And your data analyst. And your engineers.)

One of the results is that while generic ETL tools are generally very good at the E and the L (the Extract and the Load), the Transform part is kind of missing in action. And guess who gets to supplement the tools to add that critical functionality back in?

You and your team.

(Often at extra cost with extra tools.)

Plus, typical ETL tools are incentivized and priced to push as much raw data to you as possible. And the cloud vendors are perfectly aligned with that: more data, more rows, more storage … and a lot more transformation, normalization, and standardization on your own dime and your own time. 

All of which …

  1. Shouldn’t be anyone on your team’s day job, and so
  2. Inevitably gets fouled up in one way or another, and
  3. Requires maintenance on an ongoing basis
  4. Impairs your marketing data and therefore your growth capacity
  5. Distracts from your core mission

Continuous marketing ETL from a company specifically focused on marketing analytics is the answer

Simply by virtue of doing what it has done over the past decade, Singular has developed a unique and differentiated functionality: marketing ETL.

Marketers are our customers. We’re already integrated with thousands of ad partners and marketing platforms, and we already ingest all the data they produce when running your campaigns. We already extract it all to present it to marketers, and we already transform all of it in the Singular pipeline to ensure that all the currencies, the dates, the formats, the structures, the naming conventions, the enrichment, and the modeling is standardized across all of them.

Singular doesn’t offer a generic ETL:

  • Get data from X location
  • Do Y transformation to it
  • Deposit the data in Z location

Singular offers marketing ETL, with exactly what you need, in exactly the way you need it. With all that data from dozens of tables and hundreds of fields multiplied by thousands of partners neatly packaged, simplified, and ready for your BI systems. And with all updates from every one of the partners managed seamlessly. And with progression built into all your data loads, ensuring that at every given moment you have the absolute best possible data to feed your growth models.

Your engineers, developers, and data scientists will get this.

But you get this too, right?

As a marketer, you depend on your team to get your job done. You depend on the insights they deliver: the data that informs your decisions.

The more time they can afford to spend on productive work and the less time they have to waste on data janitor duties, the better they can do their jobs.

And that means you can do yours better too.

How to get marketing ETL from Singular

Talk to us.

We have dozens of massive clients already using our marketing ETL product and saving time and money with it. We’d be happy to chat with you, learn your needs, your processes, and your goals, and share how marketing ETL from Singular can help.

Book some time today.

Stay up to date on the latest happenings in digital marketing

Simply send us your email and you’re in! We promise not to spam you.