DataOps wasn’t the most deafening sound at Strata + Hadoop World San Jose this year, but as data-workflow orchestration models go, the DataOps music gets louder with each event. I’ve written before about Boston-based DataOps startup Composable Analytics. But several Strata startups are starting to get attention too.
Still-in-stealth-mode-but-let’s-get-a-Strata-booth-anyway San Francisco-based startup Nexla is pitching a combined DataOps + machine-learning message. The Nexla platform enables customers to connect, move, transform, secure, and (most significantly) monitor their data streams. Nexla’s mission is to get end users deriving value from data rather than spending time working to access it. (Check out Nexla’s new DataOps industry survey.)
DataKitchen is another DataOps four-year-overnight success. The startup out of Cambridge, Massachusetts also exhibited at Strata. DataKitchen users can create, manage, replicate, and share defined data workflows under the guise of “self-service data orchestration.” The DataKitchen guys—“Head Chef” Christopher Bergh and co-founder Gil Benghiat—wore chef’s outfits and handed out logo’ed wooden mixing spoons. (Because your data workflow is a “recipe.” Get it?)
Another DataOps-y theme at Strata: “Continuous Analytics.” In most common parlance, the buzzphrase suggests “BI on BI,” enabling data-workflow monitoring/management to tweak and improve, with the implied notion of consumable, always-on, probably-streaming, real-time BI. Israeli startup Iguazio preaches the continuous analytics message (as well as plenty of performance benchmarking) as part of its “Unified Data Platform” offering.
I got the chance to talk DataOps with IBM honchos Madhu Kochar and Pandit Prasad of the IBM Almaden Research Center. Kochar and Prasad are tasked with the small challenge of reinventing how enterprises derive value from their data with analytics. IBM’s recently announced Watson AI partnership with Salesforce Einstein is only the latest salvo in IBM’s efforts to deliver, manage, and shape AI in the enterprise.
Meanwhile, over in the data-prep world, the data wranglers over at Trifacta are working to “fix the data supply chain” with self-service, democratized data access. CEO Adam Wilson preached a message of business value—Trifacta’s platform shift aims to resonate with line-of-business stakeholders, and is music to the ears of a DataOps wonk like me. (And it echoes CTO Joe Hellerstein’s LOB-focused technical story from last fall.)
Many vendors are supplementing evangelism efforts with training outreach programs. DataRobot, for example, has introduced its own DataRobot University. The education initiative is intended both for enterprise training, but also for grassroots marketing, with pilot academic programs already in place at a major American university you’ve heard of but shall remain nameless, as well as the National University of Singapore and several others.
Another common theme: The curse of well-intentioned technology. Informatica’s Murthy Mathiprakasam identifies two potential (and related) data transformation pitfalls: cheap solutions for data lakes that can turn them into high-maintenance, inaccessible data swamps, and self-service solutions that can reinforce data-access bad habits, foster data silos, and limit process repeatability. (In his words, “The fragmented approach is literally creating the data swamp problem.”) Informatica’s approach: unified metadata management and machine-learning capabilities powering an integrated data lake solution. (As with so many fundamentals of data governance, the first challenge is doing the metadata-unifying. The second will be evangelizing it.)
I got the opportunity to meet with Talend customer Beachbody. Beachbody may be best known for producing the “P90” and “Insanity” exercise programs, and continues to certify its broad network of exercise professionals. What’s cool from a DataOps perspective: Beachbody uses Talend to provide transparency, auditability, and control via a visible data workflow from partner to CEO. More importantly, data delivery—at every stage of the data supply chain—is now real time. To get to that, Beachbody moved its information stores to AWS and—working with Talend—built a data lake in the cloud offering self-service capabilities. After a speedy deployment, Beachbody now enjoys faster processing and better job execution using fewer resources.
More Strata quick hits:
- Qubole is publishing a DataOps e-book with O’Reilly. The case-study focused piece includes use-case examples from the likes of Walmart.
- Pentaho is committed to getting its machine-learning technology into common use in the data-driven enterprise. What’s cool (to me): the ML orchestration capabilities, Pentaho’s emphasis on a “test-and-tune” deployment model.
- Attunity offers three products using two verbs and a noun. Its Replicate solution enables real-time data integration/migration, Compose delivers a data-warehouse automation layer, but it is Attunity’s Visibility product that tells the most interesting DataOps story: It provides “BI-on-BI” operations monitoring (focused on data lakes).
- Check out Striim’s BI-on-BI approach to streaming analytics. It couples data integration with a DataOps-ish operations-monitoring perspective on data consumption. It’s a great way to scale consumption with data volume growth. (The two i’s stand for “Integration” and “Intelligence.” Ah.)
- Along those same lines, anomaly-detection technology innovator Anodot has grown substantially in the last six months, and promises a new way to monitor line-of-business data. Look for new product, package, and service announcements from Anodot in the next few months.
- Focus was noticeably humble (core product has improved dramatically from four years ago, when it wasn’t so great, admitted CEO Josh James in his first keynote) and business-value-focused. (James: “We don’t talk about optimizing queries. (Puke!) We talk about optimizing your business.”)
- There was a definite scent of DataOps in the air. CSO Niall Browne presented on Domo data governance. The Domo data governance story emphasizes transparency with control, a message that will be welcomed in IT leadership circles.
- Domo introduced a new OEMish model called “Domo Everywhere.” It allows partners to develop custom Domo solutions, with three tiers of licensing: white label, embed, and publish.
- Some cool core enhancements include new alert capabilities, DataOps-oriented data-lineage tracking in Domo Analyzer, and Domo “Mr. Roboto” (yes, that’s what they’re calling it) AI functionality.
- Domo also introduced its “Business-in-a-Box” package of pre-produced dashboard elements to accelerate enterprise deployment. (One cool dataviz UI element demoed at the show: Sample charts are pre-populated with applicable data, allowing end users to view data in the context of different chart designs.)
Finally, and not at all tradeshow-related, Australian BI leader Yellowfin has just announced its semi-annual upgrade to its namesake BI solution. Yellowfin version “7.3+” comes out in May. (The “+” might be Australian for “.1”.) The news is all about extensibility, with many, many new web connectors. But most interesting (to me at least) is its JSON connector capability that enables users to establish their own data workflows. (Next step, I hope: visual-mapping of that connectivity for top-down workflow orchestration.)