Topics of Interest Archives: DataOps

On DataOps, the DoD, and Operationalizing Data Science: Questioning Authority with Composable Analytics’ Andy Vidan

AndyVidan2Andy Vidan is the CEO of Cambridge, Massachusetts-based DataOps startup Composable Analytics. He founded the company two years ago with MIT colleague Lars Fiedler. They now lead Composable—self-funded and self-sustaining, by the way—and are establishing a beachhead in the nascent DataOps space. I recently spoke with him about the genesis of his company, what it’s like to (maybe) work with the U.S. DoD, and the challenge of evangelizing DataOps to line-of-business stakeholders.

TOPH WHITMORE: Tell me about Composable Analytics.

ANDY VIDAN: Composable Analytics grew out of a project at MIT’s Lincoln Laboratory. Lincoln Lab is an MIT R&D center that’s provides advanced technology solutions to the U.S. Department of Defense and intelligence community. There, we saw the clear need for a unifying platform that can ingest all types of data and feed it to an intelligence analyst. An intelligence analyst within the Department of Defense is similar to a business analyst within the private sector. They’re sophisticated. They know their subject matter well, better than software developers may ever know their business. But they’re not always technical, and when they have to deal with different data sets from different systems, with different formats and different structures, they must rely on software engineers and use a variety of disjoint tools that further complicate their workflows.

Our approach was different: We wanted to develop a single ecosystem to bring in data from all sorts of sources, and present it to the user for self-service data discovery and analytics. For us, Big Data always meant all data. Aside from the massive amounts of data —which the community already knows how to handle—or even the high Big Data velocity and throughput, we focused on the variability that comes with all data: There’s always tabular data, and tabular data, and more tabular data, but we also have to think about image files, text documents, PDFs, sound files, and so on. We also wanted to make data accessible to an end user who knows the subject matter but is not a technical person.

TW: You and Lars Fiedler developed Composable while working at Lincoln Lab. How did Composable evolve from an MIT idea into a commercial solution?

AV: Lincoln Laboratory is a well-kept secret.

TW: With the defense department involved, it probably has to be!

AV: Yes. MIT Lincoln Laboratory is really one of the premier research labs in the US, very much like the old Bell Labs, or the Jet Propulsion Lab that NASA runs with Cal Tech. Composable Analytics was initially funded directly by the DoD. The nice thing about Lincoln Lab is that you have that user interaction. You aren’t just writing research papers, you are prototyping, building systems, you are meeting with end users—in this case, intelligence analysts and operators—to be able to really get down to requirements and get a system that they would eventually use.

TW: Does Composable Analytics still serve the Department of Defense?

AV: Yeah. So I can’t really answer the question.

TW: Good enough!

AV: Our main focus is private sector.

TW: Tell me more about the Composable Analytics technology. What value propositions do you offer to an enterprise IT leader?

AV: Three things: orchestration, automation, and analytics. To me, that really embodies what’s behind DataOps. Our platform, our ecosystem provides those three things for an enterprise and for users of data within that enterprise.

Let me walk you through a real use case: One of our financial sector customers wants to build effective customer profiles. One touch point is their call center. You might call in to request a change of address after a recent real-estate purchase. This is normally a short call: the call center agent would change the address and hang up the phone and everybody’s happy. But this is a situation where an organization can learn more about the customer. An enterprise can use that little tidbit of information that you just revealed about yourself in order to understand what other products and services you might be interested in. The fact that you purchased a home might mean you’re willing to purchase life insurance. You might mention you are having a baby. That might incite you to open an educational savings account with the company. What does this require? Being able to integrate with a Voice-over-IP system and orchestrate a data flow that takes the call-center recording, in real time, pushes it into a speech-to-text engine, takes the resulting unstructured text and uses various analytics and natural language processing techniques in order to determine intent, sentiment, and trigger words that can then be directly inserted back into a CRM. The call center agent can see that on your profile and talk to you about it during that call, or next time you call. That embodies orchestration, automation, plus analytics. Those are the types of complex all-data flow use cases we’re addressing.

TW: It sounds like a platform play. Are you essentially offering and delivering and serving pretty much the whole data value chain from ingestion through consumption?

AV: Yes, we are, and that’s where DataOps comes into play. There’s always raw data out there. At the end of the day your business users are getting value from applications, Excel or Dynamics or Power BI or Salesforce or NetSuite, whatever it is. But there’s a whole process that happens in between the raw data getting to the high-level application, a process that encompasses orchestration, automation, and analytics. That’s our play. That’s where we live. That’s what we do well.

TW: I like to talk about the enterprise conflict between IT leadership and line-of-business stakeholders like my former marketer self. Toph-the-marketing-boy wants self-service everything—data immediacy without data-administration complexity. On the other side, IT leadership is tasked with ensuring auditability, lineage, governance, security. Which side of that customer equation do you target? IT side? Business influencer? Or both?

AV: Almost always the business side.

TW: Interesting. I confess that’s not what I expected!

AV: We typically find that the business side is willing to adopt new technologies so it can directly increase business value. Back to DataOps, we enable the business side to develop operational data science solutions, through reliable and robust continuous integration, while establishing, through the use of our tools, DataOps best practices. So, when the business side is ready to have IT leadership take ownership of its proven data implementations, we already have a layer of governance, security, and auditing around it, which makes the transition that much easier.

We talk about operationalizing data. In many cases, organizations have invested in PhD-level scientists to develop, implement, and validate data models. They do this by building what is normally a one-off analytic. It works beautifully, but at that point, the model has not provided any business value to the organization.

That one-off data model or data analytic must fit into a larger data workflow, one that the organization supports, and which works in conjunction with IT. It must integrate with production databases, query data, pull it into the analytic model, perform the computation, and push it back into other production databases, production CRMs, maybe into ERP systems. It’s that part—the data-workflow management—that is missing in today’s Big Data solutions. That’s where the Composable platform comes in. It allows you to connect the data sets, plug-and-play the analytics—that you either write or bring in from other open-source libraries—and be part of this broader operational process.

TW: You’re preaching to the converted! Enterprises need to hear the DataOps gospel. But I think most face a challenge on both the data consumption and data management sides of the house: They must overcome conflicting objectives to collaborate. Do you find that it’s difficult to evangelize collaboration to these enterprise groups?

AV: No. It’s actually easy once we’re in. When enterprises use our platform as a framework for building these operational data flows, we typically have good engagement with IT leaders because they see things are developed correctly.

TW: What’s deployment like?

AV: The platform is a distributed web application developed as a native cloud application. It can be deployed on the cloud, and scales well both horizontally and vertically. You can spin up an instance of Composable on AWS or Microsoft Azure, but the public cloud is not required. We can deploy Composable for an enterprise on-premises. Back to our Department of Defense legacy, one of our requirements was to be able to run not just on-premises, but on air-gapped networks, and we can do that. With some of our customers—within insurance and finance—the data is sensitive, and we run on a cluster behind the corporate firewall completely disconnected from the web.

TW: What’s Composable’s funding situation?

AV: We were lucky enough to leave MIT with a product and customers ready and waiting. From day one—the end of 2014—we’ve been completely client-funded.

TW: Will you look to subsidize growth with outside investment?

AV: Yes. I think 2017 is the year for us. We’re reaching a point where capital will help us scale out dramatically.

We’re a growing but small company, with the entire team being technical and focused on product development. As we grow, our focus will be to bring on forward-deployed engineers and customer success managers to help with deployment. This will help us approach a broader set of customers and work with them to develop a DataOps Strategy, based on a small-scale, short-term pilot, that may last one or two months at most. After that, and after they see the value, they buy into Composable as a licensed delivery platform.

TW: Where is your customer base?

AV: All regions, but predominantly domestic. We have, for example, one large customer that is a global energy conglomerate with operations in South America and other parts of the world.

TW: I understand you’re producing an upcoming conference?

AV: Yes—the DataOps Summit conference series. The next event is in June here in our hometown in Boston. We’re focused on getting all the data professionals into the same room. That’s both the business side of the house and technical audiences, like software developers, data scientists, data engineers, IT operations, quality assurance engineers, and so on. More details online at

Many enterprises have invested in data science, and developed some cool data applications, and now must figure out how to put them in an operational workflow to actually generate value! That’s what we’re trying to illustrate with this DataOps Summit series. We’ll bring in executives from the business side—financial services, insurance, oil and gas, cybersecurity, other verticals as well—and talk about what DataOps tools, techniques, best practices they can put together around data operations. But we’ll listen, too: The technology vendors in the room—Composable and others—can work with them on a DataOps vision that we can all build towards.

TW: Where does Composable Analytics go from here?

AV: First, democratizing data science. Enterprise business users should be able to work more and more like data scientists. Our current end users are typically sophisticated business users, but not necessarily technical. Ultimately, they know the business better than anyone else. We’re creating a framework to help these users develop their own analytical workflows. Composable has a visual designer that lets you create complex dataflows regardless of your technical level. That means a complex data pipeline can be created visually, just as you would draw out a workflow on a whiteboard! We have a machine-learning computational framework behind this that will accelerate the process for an analyst to build these workflows. As that analyst selects different modules to build up the data flow, the machine will recommend the next such module to come in. So, machine learning is accelerating the development of new machine-learning data flows. That’s pretty cool.

Second, there’s a lot of noise out there, and we’ve seen many organizations delay data-management solution adoption. Composable started as a self-service analytics platform, but over time has become a DataOps platform with orchestration, automation, and analytics aimed at getting people out of the rat’s nest of spreadsheets, and to start thinking about modern data architectures. We see DataOps being this transformative notion of best practices that allow organizations to say “Okay, we can do this.” We know how to do software development. We know how to build production systems. Now, let’s bring that to the data world and start to think about production data platforms and operational data science.

Posted in Blog, Governance, Risk Management, and Compliance, Operations | Tagged | Leave a comment

This Week in DataOps: Manifestos, Shocking Steps, and the Rise of Data Governance

This Week in DataOps

Welcome to the first edition of This Week in DataOps! (And before you ask, no, it probably won’t come out every week.) For a reference point, think of “This Week in Baseball,” only the highlights are about data-derived value maximization. (Yes, that’s the hashtag: #dataderivedvaluemaximation. Lot of competition for that trademark, I bet.)

In this roundup: Two DataOps companies step into the light, two upcoming DataOps events take the stage, and a big DataOps buy signals a big DataOps player’s commitment to data governance transparency.

In news from BHR hq city Beantown, two new startups have taken up the mantra of DataOps. Composable Analytics, based across the Charles in Cambridge, grew out of a project at MIT’s Lincoln Laboratory. Cofounders Andy Vidan and Lars Fiedler started Composable back in 2014 with the aim of delivering orchestration, automation, and analytics, all within a DataOps context. Check out Andy’s lucid manifesto “Moving Forward with DataOps.” (I’m a big fan of DataOps manifestos, by the way.) Key takeaway: Real-time data flows, analytics delivered as a service, and composability are essential to DataOps success.

Another Boston-area firm is making news in the DataOps space. (New Cambridge, Massachusetts tourism slogan: Come for the craft beer. Stay for the data workflow management.) DataKitchen is the self-described “DataOps Company,” and delivers an algorithmic platform based on data “kitchens,” where enterprise data consumers create data “recipes” spanning data access, transformation, modeling, and visualization. And cofounders Christopher Berg and Gil Benghiat will be speaking on “Seven Steps to High-velocity Data Analytics with Dataops” at this month’s Strata + Hadoop World event in San Jose. (Apparently some of the steps are “shocking!” More details on that not-at-all-clickbaity preso here.)

Speaking of upcoming events, two feature a DataOps agenda. In June, head to…yep, Cambridge, Massachusetts for the DataOps Summit, a two-day show produced by the nice folks at Composable Analytics. Day one will focus on DataOps business use case and day two examines DataOps technical innovations. Speakers include Tamr CEO Andy Palmer, MIT Lincoln Lab researcher Vijay Gadepally, Unravel Data CTO Bala Venkatrao, IBM UrbanCode Deploy product manager Laurel Dickson-Bull, and chief technologist for PWC’s Global Data & Analytics practice Ritesh Ramesh. (Maybe don’t bring up the Oscars with Ritesh.)

And in late May, head to Phoenix for Data Platforms 2017. This year’s theme is “Engineering the Future with DataOps.” The show is sponsored by O’Reilly, Qubole, Amazon Web Services, and Oracle. Featured speakers include former Obama administration “Geek in Chief” R. David Edelman, Qubole CEO Ashish Thusoo, and Facebook engineering director Ravi Murthy.

And in case you missed it:

  • Informatica acquired UK-based data governance software developer Diaku. The Diaku data governance app snaps nicely into the broader Informatica portfolio. Plus Informatica gets more tech talent and at least some greater foothold in Europe. The purchase signals Informatica’s (and, arguably, the broader data-management software space at large) commitment to DataOps-y principles of orchestration, transparency, and workflow-based collaboration.
  • Tamr just patented its data unification model! As Tamr notes, the concept of data unification may not necessarily be particularly new, but Tamr’s “comprehensive approach for integrating a large number of data sources” coupled with its machine-learning algorithms is uniquely innovative enough to merit patent protection, at least in the judgment of the nice folks at the U.S. Patent and Trademark Office.

That’s it for now. See you next week in DataOps!

Posted in Analytics, Blog, Governance, Risk Management, and Compliance | Tagged | Leave a comment

DataOps, “Agile Growability,” and a Humble Dose of Humanity: 11 Things I Want to Hear at Strata + Hadoop World San Jose

See you at Strata + Hadoop World San Jose.Strata + Hadoop World San Jose is coming, and—trade show junkie that I am—I’m once again filled with anticipation. I look forward to new and exciting technologies on display, plenty of marketing hype, and of course, brightly-colored logo pens (especially the ones that double as flashlights or USB sticks). In addition to the sweet swag, here’s what I hope to see and hear in California…

Acknowledgement from data-technology vendors of the growing influence of business end users in purchase decisions. It’s no longer just about the IT leader! Selling technology for technology’s sake is not enough any more, and vendors who ignore business leadership audiences in their messaging do so at their own peril. I want to hear how cool new technologies will help not just IT leadership, but business users as well.

Context! I’m a Strata-holic. I want see all the new features of all the new functional solutions. But I want to see those solutions demo’ed in the context of broader business and DataOps workflows.

Business value! Imagine, if you will, a solution message that starts with business value and works its way backwards…like say, a technology positioned as the business case for a DataOps approach. The new data-technology sale is less about the how and more about the why: delivering tangible, measurable enterprise business value. Why aren’t we all getting that yet? (Hat-tip to the GoodData social-media folks for this much better way of putting it.)

Speaking of business value, I’m eager to hear a compelling “cloud + data = goodness” message from Microsoft. I like where Microsoft is going with its Cortana Intelligence Suite, Azure Data Factory, and Power BI. (Full disclosure: I used to work there.) But I want more. Excluding a certain online bookseller located on the opposite side of Lake Washington, Microsoft is the only major enterprise data management solution provider that owns the cloud, so to speak. In this instance, at least from Microsoft’s selling perspective, cloud is more than a commoditized, off-premise storage option—It’s a strategic advantage…I think. And I want to hear about how that’s a potential advantage for me, expressed (empathetically!) in data-analytics value terms.

And speaking of coherent cloud messages, I’m still waiting for a good solution to the data-consumption bottleneck. How can data consumers digest data (think streaming) as fast as the architecture can scale to store it? (The answer is not hiring more interns to monitor reporting dashboards.) Toph-the-marketing-boy should be able to avoid missing stuff, test new data applications easily, and work with exponentially greater datasets than he currently can. (Sisense paid darn good lip service to this challenge last fall, and I’m looking forward to an update.)

And speaking of that already-here-no-longer-looming data-consumption bottleneck as an example, I’m particularly interested in companies with data technologies that work “here” applied to what’s going on over “there.” For instance, Anodot takes its anomaly-detection technology beyond the ops world and uses it to attack the data-consumption-as-data-volume-grows-exponentially challenge. And Rocana performance-monitoring software doubles nicely as an accountability and visibility solution for senior (read: non-technical) management.

Orchestration across the silos! Point solutions are good. Functional solutions are good. But when they don’t support cross-function and cross-organizational-silo transparency, their success is limited. Platform-level data orchestration is the next big thing, and not everyone is addressing it yet. Teradata’s “Unified Data Architecture” messaging is a good start. (Teradata marketing folks, please save me a logo pen.) So is Domo’s anti-silo evangelism.

The next layer of trust in data: data solutions that are smart enough to provide on-the-fly extensibility. Call it “agile growability,” call it “smart integration,” but what it really is is a data-management model that grows dynamically as it learns from its own operation. (Continuous improvement? Oh yeah. V2.0.)  A good DataOps workflow provides the best data journey at that moment. A great DataOps workflow is smart enough to improve itself over time. A business user should be able to not just trust in the data now, but trust that the next dataset will be even better. Who’s headed this way?

Actian, I’ve had a change of heart. Please…bring back the dancers.

Democratization that’s meaningful. TIBCO, I’m looking your way—Tell me more about “self-service integration for all” (and why it’s better than the alternatives). And DataRobot—Your advanced analytics are stellar, but what’s the true business impact of my becoming a “citizen data scientist?”

Finally, a human request: Our industry has been built upon—and thrives because of—the contributions of immigrants. I speak as one (to Canada) when I ask: How can we support our tech workers impacted by possible U.S. immigration restrictions? Some initial options for Big Data companies: sign amicus briefs, petition for more H1-B visas, and hug your employees. And if it comes to it, consider opening satellite development offices in other countries. (Canadian technology firms may not wait for you.)


Posted in Blog, Research | Tagged , , | 1 Comment