DataOps: A Modest Proposal for Rethinking Enterprise Data Management

Providing Data Ops is better than being a Data Barista. Even if cookies are involved.

In 2013, Gene Kim, Kevin Behr, and George Spafford published The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win. It chronicled the story of Bill Palmer, an IT leader at Parts Unlimited, a company that makes…parts. Palmer finds himself in a role he doesn’t want, working with people who don’t want to work with him, and fighting seemingly inextinguishable fires.

The Phoenix Project—truly the War and Peace of IT Operations novels—unabashedly borrows its conceit from Eliyahu Goldratt’s 1984 book The Goal: A Process of Ongoing Improvement. (In graduate school, The Goal was required spring-break reading. I even chronicled the experience for my alumni magazine.) The Goal preached a message of manufacturing optimization. The Phoenix Project analogously introduces the concept of DevOps—a management framework that applies principles of agile management (think Toyota, lean, Kanban, theory of constraints, and lots and lots of Post-it notes)—to enterprise operations and development.

The Phoenix Project is a seminal work, not just for its real-world-like DevOps examples, but for its revolutionary suggestion that developers and ops folks should…wait for it…collaborate. Getting IT and development on the same page is a challenge for any C-suite. But Kim, Behr, and Spafford—having endured operational pain themselves—offer a compelling, constructive, and creative approach to solving that challenge.

Is it time for The Phoenix Project 2? (The Goal 3?)

In the enterprise data world, we face a similar challenge. Line-of-business data consumers seek detailed analysis, data-access immediacy, flexible data structures, data accuracy, “trustable” data, easy data “digestibility”, extensibility, and the power to change their minds at any time. I could go on. (Toph-the-marketing-boy says “I want it all, I want it now, and I might change my mind later. Oh, and don’t make me compromise. Or have to think too hard.”)

To serve needy data consumers like my former self, data-tech vendors have produced powerful self-service solutions—data prep, data integration, BI, even machine learning.

But delivering on the ideals of data self-service can come with costs borne by upstream data managers. It can introduce complexity for those tasked with ensuring data-governance compliance, putting them in the service role of “data barista”: (Hey IT! Toph-the-marketing-boy wants his venti-half-caff-two-pumps-pumpkin-spice-soy-data latte, or else he’s going to Café Prétentieux for his caffeinated data needs.)

Balancing restrictive data governance mandates with the ever-evolving demands of LOB data consumers is analogous to the DevOps challenges IT management faced five years ago. Tempted by the convenience of virtual-machine availability, developers often “went rogue” and commissioned AWS VMs outside the purview of IT leadership.

Today, “rogue” data consumers like that former self of mine grow impatient with slow-turn query requests to centralized IT resources: Submit the request, wait three weeks because “they’re fixing the upgrade from last Friday that didn’t work”, realize the request has gone out of date in the interim, get the results, ignore them. (Then mutter some choice words, submit a new request, and lather, rinse, repeat.) Instead, maybe I’ll just massage my own data, the data I got from that one spreadsheet, you know the one, from that email that went around when was it, last month?

It’s time for a new approach. An approach that borrows—again, unabashedly—from the fundamental principles of manufacturing best practices and the DevOps framework. An approach that addresses the basic enterprise challenge of balancing self-service data empowerment freedom with the control required to enforce enterprise data-governance mandates. This is DataOps.

I’m not exactly coining a term:

- Another research company sought to define “data ops” as a centralized hub for IT leaders to parcel out data, with the overarching mission to “control” consumer access to systems of record.
- In 2014, consultant and InformationWeek contributor Lenny Liebman more constructively defined DataOps as “the set of best practices that improve coordination between data science and operations.
- More recently, Tamr CEO Andy Palmer called for a DataOps approach to foster collaboration between Data Engineering, Data Integration, Data Quality, and Data Security/Privacy functional delivery. (Read his 2015 introduction to DataOps here. And read my August 2016 interview with Palmer here.)

Liebman and Palmer laid the DataOps groundwork, and continue to lobby for its implementation. My work on the subject builds upon theirs.

The New DataOps Framework: Ideals, Metrics, Enablers, Challenges

Borrowing (unabashedly!) from Liebman, Palmer, and Messrs. Kim, Spafford, and Behr, here’s my modest proposal:

DataOps is an enterprise collaboration framework that aligns data-management objectives with data-consumption ideals to maximize data-derived value.

Enterprise DataOps serves five fundamental enterprise objectives (all of which should be measurable), is enabled by two technology evolutions, and seeks to overcome six key hurdles.

DataOps aims to deliver upon five fundamental enterprise ideals:

1. Maximize data-derived value.
The most important objective: Without tangible, measurable outcome targets, a DataOps initiative will not succeed. (Here’s my recent take on how to measure data initiative value, along with some examples of enterprises doing data monetization right.)
2. Empower data consumers with democratized data.
Data consumers demand flexibility, extensibility, and accessibility. Meeting those expectations of freedom is easy to say and hard to do, but nonetheless essential for delivering enterprise value maximization.
3. Ensure data security, data-governance compliance, data accuracy, data integrity, data lineage.
Consumers must be able to trust their data; managers must be able to deliver data within corporate guidelines.
4. Anticipate the unanticipatable.
It used to be that data was only as valuable as what an end user could ascertain from it. Not any more. Successful DataOps environments exploit machine learning, AI, predictive, prescriptive, and extensible connectivity technologies to make data delivery smart enough to deal with the unexpected.
5. Achieve data immediacy.
Whatever insight it may provide, for whatever use it may be intended, from whatever source it may originate, and however it may reach its destination, data must be available as fast as its end users can act upon it, or ideally, faster.

Setting goals is great, but pointless if progress towards them cannot be easily measured. DataOps leaders must make the most of key metrics (“profit!”) while reducing pain (“cost!”):

Maximize:
- Value delivered
- Data immediacy
- Flexibility
- Trust
Minimize:
- Data-flow bottlenecks
- Data cycle time from acquisition to consumption
- Non-critical-path work (e.g., unplanned work, maintenance)

In his Tamr blog post, Palmer cites two drivers of DataOps:

1. Self-service technologies (like Alteryx or Trifacta, which Tamr’s data-unification solution complements)
2. Specialized, data-specific database technologies (such as Vertica, Volt DB, etc.)

I agree with Palmer’s assessment (and unabashedly borrow it), though I consider these enabling technologies. Semantics aside, self-service data-prep, data-integration, and BI solutions abstract away admin work from data consumers, and enable enterprise data democratization in the true sense of the word. And the evolution of data storage technologies (including the post-cluster ideals of data lakes) expand data-delivery methods in the enterprise.

Finally, to succeed, DataOps practitioners will have to overcome operational hurdles. They must:

1. Establish open communication channels between functional stakeholders.
2. Align operational and business data goals (specifically, the DataOps objectives above) across the organization.
3. Maximize efficiency with the “right” kinds of work. (Reduce amount of time on “munging”!, focus on actual analysis!) Theory of constraints?
4. Balance data governance mandates with self-service delivery.
5. Manage DataOps scope.
6. Accommodate change.

DataOps: Beyond a Bulleted List of Conceptual Ideals

DataOps is a mindset. It requires a commitment to open communication, and to aligning operations and line-of-business priorities to deliver what’s right for the enterprise. It starts at the end (data-derived value) and bolsters an enterprise data value chain designed to maximize that data-derived value delivery.

Some might suggest that DataOps is about operationalizing data. I suggest it’s about building an operation around your data…an operation aimed at delivering the most value for your enterprise. (And then tweaking the hell out of it, continuous improvement and all.) Delivering on the promise of self-service data analytics without sacrificing data-governance compliance is something everyone in the data-driven enterprise can get behind. (Even Toph-the-marketing-boy.)

What do you think? Are you applying a DataOps approach to data management and consumption in your organization? DM or email me and tell me about it. And look for my upcoming Blue Hill Research report on the subject.

About Toph Whitmore

Toph Whitmore is a Blue Hill Research principal analyst covering the Big Data, analytics, marketing automation, and business operations technology spaces. His research interests include technology adoption criteria, data-driven decision-making in the enterprise, customer-journey analytics, and enterprise data-integration models. Before joining Blue Hill Research, Toph spent four years providing management consulting services to Microsoft, delivering strategic project management leadership. More recently, he served as a marketing executive with cloud infrastructure and Big Data software technology firms. A former journalist, Toph's writing has appeared in GigaOM, DevOps Angle, and The Huffington Post, among other media. Toph resides in North Vancouver, British Columbia, Canada, where he is active in the local tech startup community as an angel investor and corporate advisor.
Posted on November 28, 2016 by Toph Whitmore

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Latest Blog

Blue Hill Research Communications Lifecycle Management Highlights: May 2017 Blue Hill Research Communications Lifecycle Management Highlights: April 2017 This Week in DataOps: Rain, the Real World, and Another Manifesto (the Good Kind)

Topics of Interest

Advanced Analytics

AI

Analytics

Anodot

Attunity

authentication

BI

Big Data

Blog

Business Intelligence

Cloud

Cognitive Computing

Corporate Payments

Data Management

Data Preparation

Data Wrangling

DataKitchen

DataOps

DataRobot

design

design thinking

Domo

Emerging Tech

enterprise applications

Enterprise Performance Management

enterprise video

fog computing

General Industry

GoodData

GRC

Hadoop World

Human Resources

IBM

IBM Interconnect

Iguazio

ILTACON

Informatica

Information Builders

innovation

Internet of Things

IoT

knowledge

legacy IT

Legal

Legal Tech

Log Data

Machine Learning

Managed Mobility Services

Microsoft

Mobility

Nexla

Order-to-Cash

passwords

Pentaho

Podcast

Predictive Analytics

Private Equity

Procure-to-Pay

Qubole

Questioning Authority

Recurring Revenue

Risk Management

ROI

Sales Enablement

Salesforce

Security

service desk

Social Media

Strata

Striim

Supply Chain Finance

Switchboard Software

Tableau

Talend

Tangoe

Telecom Expense Management

Time-to-Value

Trifacta

TWIDO

Unified Communications

usability

USER Applications

User Experience

User Interface

video platform

Virtualization

Visualization

Wearable Tech

Yellowfin