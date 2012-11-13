Now that the television industry has begun to embrace big

data for everything from tracking social media before, during and after

broadcasts, to ad targeting, there is an assumption that collecting set-top box

data is pretty much analogous to collecting website visit data using Web logs. Nothing

could be further from the truth.





Coming from the online space and having dealt with large

data sets about online visitor behaviors, I have had to learn some challenging

lessons about the complexities of dealing with data about TV in general, and

anonymous set-top box data, specifically. Understanding how to make any data actionable and useful has been

a long-standing challenge. Researchers in knowledge management have been

proposing and discussing frameworks that describe the continuum of data to

information to knowledge to wisdom for the past 40 years in order to support

better decision making. I think about transforming set-top box data into

actions as the path from data to wisdom.





A set-top box is a device that enables televisions to

receive and display television signals. It also enables the multichannel video

program distributors (MVPDs) to collect information about set-top box

activities. While many of the multichannel video program distributors and set-top

box technology companies have begun to productize the anonymous set-top box

data available to third parties, there are still no standards. A set-top box

event can represent an action (a channel change) or an activity (viewing a

channel for a period of time). Set-top box logs can represent a single day's

events or all available events for a period of time for a given set-top box. Time

zones, channels, call sign, network and regions differ across set-top box data

providers. Without the addition of program scheduling data and ad run logs,

there is no way to know what program or commercials were aired at the time the set-top

box was tuned to a given channel. On its own without translation and

transformation, set-top box data truly helps us "know nothing."





Set-top box data must be translated and transformed before

it is informative. Designing the data model to provide a universal view of disparate

datasets is an important first step. Determining appropriate cap and edit rules

and data mapping requirements come next. Finally, the data transformation

processes must be developed to align event times, interpret viewer actions, map

channels to networks and programs, reform the data and evaluate data quality

for hundreds of millions of events every day. This effort gives you a view into

what set-top box data can tell you about millions of anonymized TV viewers.





Gaining knowledge from set-top box data requires

understanding how each data set relates to the others. Cross-data validation

ensures that the transformation results are valid and helps you understand how

viewing characteristics differ across MVPDs, regions, networks and day parts. At

Simulmedia, we developed methods to ensure visibility within the set-top box

data of the Simulmedia Audience Network footprint and to convert device-based

campaign measures into Nielsen national household measures, which enable us to

plan and measure audience targeted television campaigns.





Now as an emerging industry, we are focused on developing techniques

that make us wiser -- techniques to better forecast reach, to better optimize

campaigns and to better measure small, unrated networks and day parts among

other interesting experiments. My most valuable lesson on how to turn set-top

box data into actionable insights is that a universal data model, data

management processes, data mapping, event interpretation, data validation,

statistical and machine learning techniques are all needed.





Even though transforming set-top box data and Web log data

is quite different, the potential to impact the advertising industry is very

similar. Just as Web log data gave advertisers the ability to move beyond

contextual targeting to visitor-based behavioral targeting on the Web, set-top

box data is giving advertisers the ability to find their desired, often hard-to-reach,

audience on unexpected, undervalued networks and dayparts on television. Just

as Web log data gave advertisers new, powerful insights about the effectiveness

of their online campaigns, set-top box data is powering new insights about

television viewing, television audiences and commercial viewing.



Simulmedia Inc., a TV ad company based in New York, operates the

Simulmedia Audience Network, the world's first data-driven audience network for

television. The company's targeting platform leverages predictive technologies

and anonymous viewing data from more than 30 million U.S. TV viewers to help

national advertisers and their agencies better reach their target audiences,

and better measure the results.