Skip to main content

The Path From Data to Wisdom

Now that the television industry has begun to embrace big
data for everything from tracking social media before, during and after
broadcasts, to ad targeting, there is an assumption that collecting set-top box
data is pretty much analogous to collecting website visit data using Web logs. Nothing
could be further from the truth.

Coming from the online space and having dealt with large
data sets about online visitor behaviors, I have had to learn some challenging
lessons about the complexities of dealing with data about TV in general, and
anonymous set-top box data, specifically. Understanding how to make any data actionable and useful has been
a long-standing challenge. Researchers in knowledge management have been
proposing and discussing frameworks that describe the continuum of data to
information to knowledge to wisdom for the past 40 years in order to support
better decision making. I think about transforming set-top box data into
actions as the path from data to wisdom.

A set-top box is a device that enables televisions to
receive and display television signals. It also enables the multichannel video
program distributors (MVPDs) to collect information about set-top box
activities. While many of the multichannel video program distributors and set-top
box technology companies have begun to productize the anonymous set-top box
data available to third parties, there are still no standards. A set-top box
event can represent an action (a channel change) or an activity (viewing a
channel for a period of time). Set-top box logs can represent a single day's
events or all available events for a period of time for a given set-top box. Time
zones, channels, call sign, network and regions differ across set-top box data
providers. Without the addition of program scheduling data and ad run logs,
there is no way to know what program or commercials were aired at the time the set-top
box was tuned to a given channel. On its own without translation and
transformation, set-top box data truly helps us "know nothing."

Set-top box data must be translated and transformed before
it is informative. Designing the data model to provide a universal view of disparate
datasets is an important first step. Determining appropriate cap and edit rules
and data mapping requirements come next. Finally, the data transformation
processes must be developed to align event times, interpret viewer actions, map
channels to networks and programs, reform the data and evaluate data quality
for hundreds of millions of events every day. This effort gives you a view into
what set-top box data can tell you about millions of anonymized TV viewers.

Gaining knowledge from set-top box data requires
understanding how each data set relates to the others. Cross-data validation
ensures that the transformation results are valid and helps you understand how
viewing characteristics differ across MVPDs, regions, networks and day parts. At
Simulmedia, we developed methods to ensure visibility within the set-top box
data of the Simulmedia Audience Network footprint and to convert device-based
campaign measures into Nielsen national household measures, which enable us to
plan and measure audience targeted television campaigns.

Now as an emerging industry, we are focused on developing techniques
that make us wiser -- techniques to better forecast reach, to better optimize
campaigns and to better measure small, unrated networks and day parts among
other interesting experiments. My most valuable lesson on how to turn set-top
box data into actionable insights is that a universal data model, data
management processes, data mapping, event interpretation, data validation,
statistical and machine learning techniques are all needed.

Even though transforming set-top box data and Web log data
is quite different, the potential to impact the advertising industry is very
similar. Just as Web log data gave advertisers the ability to move beyond
contextual targeting to visitor-based behavioral targeting on the Web, set-top
box data is giving advertisers the ability to find their desired, often hard-to-reach,
audience on unexpected, undervalued networks and dayparts on television. Just
as Web log data gave advertisers new, powerful insights about the effectiveness
of their online campaigns, set-top box data is powering new insights about
television viewing, television audiences and commercial viewing.

Simulmedia Inc., a TV ad company based in New York, operates the
Simulmedia Audience Network, the world's first data-driven audience network for
television. The company's targeting platform leverages predictive technologies
and anonymous viewing data from more than 30 million U.S. TV viewers to help
national advertisers and their agencies better reach their target audiences,
and better measure the results.