Mobile Data Primer: Demographic Data

Share

Search

February 23, 2018

Of all the data sets we’ve talked about in our Mobile Data Primer Series, demographic data (or demo data) is probably the easiest for marketers to understand. Incorporating basic descriptors like age, gender, and language, it is the the foundation of all targeting and tends to be the first kind of data people use when they are looking to go beyond broadly targeted campaigns. There is a lot of market hype about machine learning and algorithmic targeting, but basic segmentation around quality demographic data can drive a large percentage of the performance lift of a campaign.

However, that lift can turn into a drag if the data isn’t accurate. And good, accurate demographics can be one of the trickiest data sets to find, especially outside of the walled gardens of Google and Facebook (and maybe even within them, as you’ll see). Here we take a look at what demo data is, how it is collected and verified, and how mobile marketers can use it.

The Demo Data Basics

Demographic data is general information about population groups based on certain attributes. While age and gender are the most universally used for segmentation and targeting, demographic data can span a range of attributes and socio-economic characteristics such as income, ethnicity, language, education level, residence, occupation or employment, and more.

Once this data is collected, it is connected to identifying information such as a street address, phone number, email address, cookie, or Mobile Advertising ID: whatever a marketer has that they can use to deliver advertising to a person. And voilà, there you have a basic profile of an individual: where to reach them, how old they are, their gender, their socio-economic status, and other factors that can be used to predict behavior.

This process presents two points where marketers need to focus on accuracy—when the data is collected, and when it is mapped to an identifier.

So Where Does Demo Data Come From?

Demographic data has a long history. Governments have been collecting this information on their populations for thousands of years; a glance at a family history site can show you census data like age, occupation, marital status, and veteran status from a hundred years ago and longer.

The earliest direct marketers used census data like this and supplemented it with information from sources like surveys and product registrations to draw assumptions about households and the people who lived in them. Note the word “assumption”—a lot of this work was done through modeling of things like the typical ethnicity, religion, and household income of particular neighborhoods. In the digital world, this dichotomy still exists: there is a pool of demographic data that is reported by users (which we call “deterministic data”), and a much, much larger pool of demographic data that is arrived at through modeling, algorithms, and guessing (“probabilistic data”). As any woman with a typically male name can tell you, this modeling has its limits.

So for a marketer in search of accurate demographic data, the data source is the first place to consider key questions:

Is the data deterministic, probabilistic, or a mix?
If the data is deterministic, was there an incentive for providing accurate information? This doesn’t need to be monetary; sometimes an app or website requires a user to accurately report this information to make it function correctly, for example.
If the data is probabilistic, what kind of assumptions are used? How are they verified? How often are the results verified against objective truth sets?

From Households to People: Mapping Demo Data To Identifiers

Direct marketers were the first advertisers to use demographic data at scale and started by mapping it to personally identifiable attributes such as name, address and phone number to drive direct mail and telesales campaigns.

As technology has evolved, so has the ability to move from these household-level identifiers to ones that are more specific to an individual and yet more private. Thirty years ago, if you purchased a mailing list from a nonprofit, you would have enough information to show up at a person’s house, if you so desired, although you might find that they moved two years ago. These days, identifiers like cookies and Mobile Ad IDs give a marketer a better chance of directing an ad right to the person they intend, but since they are just an anonymized string of letters and numbers, the user’s privacy is preserved.

As we said above, the mapping of demographic data to identifier is another opportunity for inaccuracy to creep into a data set. So here’s a look at how this mapping is done and what you should look for to make sure your data is as accurate as possible:

Household vs. People-Based Mapping

Some identifiers (street address and home phone) are more strongly linked to households, while some (email address, cell phone Mobile Ad ID, and cell phone number) can be reliably assumed to point to an individual. And some sit in the middle; a tablet, for example, is often shared by a family, so a cookie or a Mobile Ad ID linked to it might direct your ad to the person you’re targeting or it might direct it to their spouse or child instead. There are some data variables where this may matter less for targeting effectively, such as household income or auto ownership. However, if you need highly accurate age and gender demo data, people-based data linkages are a must. This is important because many times modelers build scale by taking reported household-level data and attempting to map it to individuals, which can compromise accuracy. So make sure to ask your demo data provider how your data was mapped.

Mapping Demographic Data to Cookies

There are two main ways that data companies map demo data to cookies. First, they can natively map it via web forms, such as an online registration page where you provide your age and gender. When the data is entered by a user, a cookie is set on their browser with the linkage between the form data and the cookie. The second way to map data to cookies is to leverage an ID graph. Demo data that has been mapped to an email address can be onboarded to cookies by leveraging an identity graph that has large-scale mappings from emails to cookies. If demo data can be mapped to an email and that email can separately be mapped to a cookie, the demo data can be connected to cookies. Note that there are a lot of onboarding solutions in the market that leverage probabilistic models which may result in limited data accuracy, so this is another thing to check on.

Mapping Demo Data to Mobile

Consumers are spending more time in-app, which means marketers increasingly need ways to target mobile users, and their Mobile Ad IDs, accurately. One way is to map the data from desktop using cross-device algorithms. These algorithms look for connections between mobile and desktop devices to create linkages between them. If the demographic data was poorly mapped to a cookie, further linking it to mobile devices using algorithms can lead to completely erroneous targeting. A more accurate, deterministic approach is to map the data directly from in-app data collection. When a user registers with a new app, for example, the app has their demo data linked to their Mobile Ad ID.

Bias Awareness: How Demo Data is Verified

Given the lack of standards for how to map data to advertising, cookies or mobile, verification companies have emerged to provide a gauge of data quality. The big two are comScore and Nielsen, and they work to measure the accuracy of digital targets for different demographic segments. Why? Because brands do not want to spend ad dollars on inaccurate targets, and often will refuse to pay for demographic ad targeting that misses the mark.

These verification companies collect research-grade, highly-surveyed, panels of users. For these panels, they have multiple addressable touch points including name, address, phone, email, cookie and Mobile Ad IDs. The goal is to verify with the greatest confidence that a user is who they say they are, that survey and demo data is accurate, and that the addressable identifiers deterministically map to those users. But every panel has a bias, and despite focusing on accuracy and similar statistical methodologies, there are often stark differences between comScore and Nielsen ratings on the same set of data.

In addition, there are some inherent weaknesses in demo data targeting. First, and most obvious, is that demographic data may not be the best predictor of whether someone will respond as desired to an advertisement. There are many other ways to target users on mobile devices that could be more effective, such as App Install or Location Data. And second, demographic data is not always accurate or real, even if it has been reported by the user. Have you ever lied when you reported your age or gender in a web or mobile form? You’re not alone. Facebook, for example, has more American teenagers registered as users than actually exist. Simply because data is deterministic does not mean it is correct.

To get beyond these biases, there are a number of things a data company can do. The first and most important is to focus on high-quality data collection. Another is to collect data from a wide variety of sources and throw out data that conflicts. This multisourcing results in data that is far more accurate and reliable. And finally, the data can be verified using quality truth sets. All three of these things are very hard to do at scale—something to keep in mind if you see companies promising demo data on hundreds of millions of mobile users.

Using Demographic Data

Targeting with demographic data is foundational and powerful, and we could fill 10 pages with examples; most marketers have been doing it from the time of their very first job. Now ad inventory is largely moving to mobile apps, making accurate in-app demographic targeting a big need in mobile marketing. With increased demand comes innovation, but also higher expectations. Marketers today will pay a premium for good demographic targeting, but also tie success rates to payment. Targets missed can equal dollars lost for digital ad companies.

Machine learning and AI are a newly prominent use of demographic data, and with these, the need for accuracy is even more acute. A good algorithm requires good, clean data as a foundation. So if you’re using demographic data for either traditional targeting or to feed machine learning, use this guide to help make sure you are getting the best data you can.

Vetted and verified data is what Twine is all about. Click here to read our TrueData Manifesto.