In 2013, Gene Kim, Kevin Behr, and George Spafford published The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win. It chronicled the story of Bill Palmer, an IT leader at Parts Unlimited, a company that makes…parts. Palmer finds himself in a role he doesn’t want, working with people who don’t want to work with him, and fighting seemingly inextinguishable fires.
The Phoenix Project—truly the War and Peace of IT Operations novels—unabashedly borrows its conceit from Eliyahu Goldratt’s 1984 book The Goal: A Process of Ongoing Improvement. (In graduate school, The Goal was required spring-break reading. I even chronicled the experience for my alumni magazine.) The Goal preached a message of manufacturing optimization. The Phoenix Project analogously introduces the concept of DevOps—a management framework that applies principles of agile management (think Toyota, lean, Kanban, theory of constraints, and lots and lots of Post-it notes)—to enterprise operations and development.
The Phoenix Project is a seminal work, not just for its real-world-like DevOps examples, but for its revolutionary suggestion that developers and ops folks should…wait for it…collaborate. Getting IT and development on the same page is a challenge for any C-suite. But Kim, Behr, and Spafford—having endured operational pain themselves—offer a compelling, constructive, and creative approach to solving that challenge.
Is it time for The Phoenix Project 2? (The Goal 3?)
In the enterprise data world, we face a similar challenge. Line-of-business data consumers seek detailed analysis, data-access immediacy, flexible data structures, data accuracy, “trustable” data, easy data “digestibility”, extensibility, and the power to change their minds at any time. I could go on. (Toph-the-marketing-boy says “I want it all, I want it now, and I might change my mind later. Oh, and don’t make me compromise. Or have to think too hard.”)
To serve needy data consumers like my former self, data-tech vendors have produced powerful self-service solutions—data prep, data integration, BI, even machine learning.
But delivering on the ideals of data self-service can come with costs borne by upstream data managers. It can introduce complexity for those tasked with ensuring data-governance compliance, putting them in the service role of “data barista”: (Hey IT! Toph-the-marketing-boy wants his venti-half-caff-two-pumps-pumpkin-spice-soy-data latte, or else he’s going to Café Prétentieux for his caffeinated data needs.)
Balancing restrictive data governance mandates with the ever-evolving demands of LOB data consumers is analogous to the DevOps challenges IT management faced five years ago. Tempted by the convenience of virtual-machine availability, developers often “went rogue” and commissioned AWS VMs outside the purview of IT leadership.
Today, “rogue” data consumers like that former self of mine grow impatient with slow-turn query requests to centralized IT resources: Submit the request, wait three weeks because “they’re fixing the upgrade from last Friday that didn’t work”, realize the request has gone out of date in the interim, get the results, ignore them. (Then mutter some choice words, submit a new request, and lather, rinse, repeat.) Instead, maybe I’ll just massage my own data, the data I got from that one spreadsheet, you know the one, from that email that went around when was it, last month?
It’s time for a new approach. An approach that borrows—again, unabashedly—from the fundamental principles of manufacturing best practices and the DevOps framework. An approach that addresses the basic enterprise challenge of balancing self-service data empowerment freedom with the control required to enforce enterprise data-governance mandates. This is DataOps.
I’m not exactly coining a term:
- Another research company sought to define “data ops” as a centralized hub for IT leaders to parcel out data, with the overarching mission to “control” consumer access to systems of record.
- In 2014, consultant and InformationWeek contributor Lenny Liebman more constructively defined DataOps as “the set of best practices that improve coordination between data science and operations.”
- More recently, Tamr CEO Andy Palmer called for a DataOps approach to foster collaboration between Data Engineering, Data Integration, Data Quality, and Data Security/Privacy functional delivery. (Read his 2015 introduction to DataOps here. And read my August 2016 interview with Palmer here.)
Liebman and Palmer laid the DataOps groundwork, and continue to lobby for its implementation. My work on the subject builds upon theirs.
The New DataOps Framework: Ideals, Metrics, Enablers, Challenges
Borrowing (unabashedly!) from Liebman, Palmer, and Messrs. Kim, Spafford, and Behr, here’s my modest proposal:
DataOps is an enterprise collaboration framework that aligns data-management objectives with data-consumption ideals to maximize data-derived value.
Enterprise DataOps serves five fundamental enterprise objectives (all of which should be measurable), is enabled by two technology evolutions, and seeks to overcome six key hurdles.
DataOps aims to deliver upon five fundamental enterprise ideals:
1. Maximize data-derived value.
The most important objective: Without tangible, measurable outcome targets, a DataOps initiative will not succeed. (Here’s my recent take on how to measure data initiative value, along with some examples of enterprises doing data monetization right.)
2. Empower data consumers with democratized data.
Data consumers demand flexibility, extensibility, and accessibility. Meeting those expectations of freedom is easy to say and hard to do, but nonetheless essential for delivering enterprise value maximization.
3. Ensure data security, data-governance compliance, data accuracy, data integrity, data lineage.
Consumers must be able to trust their data; managers must be able to deliver data within corporate guidelines.
4. Anticipate the unanticipatable.
It used to be that data was only as valuable as what an end user could ascertain from it. Not any more. Successful DataOps environments exploit machine learning, AI, predictive, prescriptive, and extensible connectivity technologies to make data delivery smart enough to deal with the unexpected.
5. Achieve data immediacy.
Whatever insight it may provide, for whatever use it may be intended, from whatever source it may originate, and however it may reach its destination, data must be available as fast as its end users can act upon it, or ideally, faster.
Setting goals is great, but pointless if progress towards them cannot be easily measured. DataOps leaders must make the most of key metrics (“profit!”) while reducing pain (“cost!”):
- Value delivered
- Data immediacy
- Data-flow bottlenecks
- Data cycle time from acquisition to consumption
- Non-critical-path work (e.g., unplanned work, maintenance)
In his Tamr blog post, Palmer cites two drivers of DataOps:
1. Self-service technologies (like Alteryx or Trifacta, which Tamr’s data-unification solution complements)
2. Specialized, data-specific database technologies (such as Vertica, Volt DB, etc.)
I agree with Palmer’s assessment (and unabashedly borrow it), though I consider these enabling technologies. Semantics aside, self-service data-prep, data-integration, and BI solutions abstract away admin work from data consumers, and enable enterprise data democratization in the true sense of the word. And the evolution of data storage technologies (including the post-cluster ideals of data lakes) expand data-delivery methods in the enterprise.
Finally, to succeed, DataOps practitioners will have to overcome operational hurdles. They must:
1. Establish open communication channels between functional stakeholders.
2. Align operational and business data goals (specifically, the DataOps objectives above) across the organization.
3. Maximize efficiency with the “right” kinds of work. (Reduce amount of time on “munging”!, focus on actual analysis!) Theory of constraints?
4. Balance data governance mandates with self-service delivery.
5. Manage DataOps scope.
6. Accommodate change.
DataOps: Beyond a Bulleted List of Conceptual Ideals
DataOps is a mindset. It requires a commitment to open communication, and to aligning operations and line-of-business priorities to deliver what’s right for the enterprise. It starts at the end (data-derived value) and bolsters an enterprise data value chain designed to maximize that data-derived value delivery.
Some might suggest that DataOps is about operationalizing data. I suggest it’s about building an operation around your data…an operation aimed at delivering the most value for your enterprise. (And then tweaking the hell out of it, continuous improvement and all.) Delivering on the promise of self-service data analytics without sacrificing data-governance compliance is something everyone in the data-driven enterprise can get behind. (Even Toph-the-marketing-boy.)
What do you think? Are you applying a DataOps approach to data management and consumption in your organization? DM or email me and tell me about it. And look for my upcoming Blue Hill Research report on the subject.