Topics of Interest Archives: Informatica

In Praise of (Data) Transparency - Part #2

InPraiseOfDataTransparency2bIn my previous blog on data transparency, I posited my admittedly idealistic vision that—within reason—the more that an enterprise fosters the free flow of data through an enterprise, the better. In this follow-up, I’ll look at some of the organizational blockers to data workflows, and how to get around them.

I’ll start with the basic underlying ideal: More data is better. If I work in marketing, I need to be able to see marketing data. And sales data. And financial data. And product management data. And…I could go on, but you get the point.

The problem, the challenge, really, is that in far too many organizations, that glorious cross-functional data just doesn’t flow across the enterprise, or I should say, over or through its silos, be they functional, architectural, or process-based. Perhaps it’s naive of me to ask, but why on earth does this obstinate hindrance to progress still persist?

Data blockages—institutional or human-created—lead to data-hoarding. (Know any data hoarders in your enterprise? Am I the only one who thinks “Data Hoarders” would make for a great reality show?)Let’s look at some of the organizational contributors to data blockage. Any of these data-hoarding characteristics hit close to home?

  • Provincialism: “It’s my data. I own it. Only I get to derive value from it. Plus, I may be able to use it against those who anger me.”
  • Trust (or more specifically, the lack thereof): “This data is proprietary, and must remain confidential. I don’t know who you hire over there in [other department that's not mine], therefore, I cannot trust you with this information.”
  • Change is a threat: “We’ve always done it this way. We’ve never shared before, and we’re not about to change for your benefit.”
  • Incompatibility: “You’re the one who chose that marketing automation solution. It’s not my fault it doesn’t easily integrate with my CRM.”
  • Misplaced or missing incentives: “What benefit will I see if I share data with you? It will cost me time/money to share, and could even be a risk…one I’m not willing to take.”

The inefficient flow of information in an enterprise so often boils down to organizational dysfunction. How willing are you and your colleagues to work together to share data? Would you share your team’s data with someone in your enterprise you don’t like? Does sharing your team’s data with another group deliver tangible benefits to your team?

Defeating the data-hoarders requires a corporate commitment to the free flow of data over, under, and through the enterprise. That’s an organizational behavior and leadership challenge that should be addressed at the C-suite level.

Moving towards data transparency requires more than just progressive leadership. Effective data integration is a prerequisite. Technology helps, on both the data management and data consumption sides of the equation. For example, Informatica frames its data-management capabilities around its Enterprise Information Catalog, or EIC for short. The EIC is Informatica’s data catalog solution, a technology that leverages machine learning to catalog, classify, and map relationships between enterprise data assets. The end user (typically a data scientist or even a business user) can get at her or his data assets via a search interface. That new process delivers benefits: Discovery is convenient, access is accelerated, and perhaps most importantly, the data is trustworthy.

The data-workflow approach championed by Informatica and other data-integration and data-cataloging vendors works, and delivers all the tangible benefits the vendors’ respective marketing materials trumpet. But no technology by itself can overcome myopic, office-politics-driven data-hoarding. To reap the benefits of true enterprise data transparency, you’re going to have to come to agreement with your peers—even the ones who drive you crazy—on five simple words: “We’re all in this together.”

Posted in Blog | Tagged , | Leave a comment

This Week in DataOps: The Promotional Edition

TWIDO logoSpring has sprung (finally, though only briefly here in Canada), which means it’s webinar and publishing season! And that makes for a busy month in the DataOps world.

Join me and Information Builders VP of Marketing Jake Freivald Thursday, April 27 2017 for our webinar on “DataOps in the Real World: How Innovators are Reinventing Their Business Models with Smarter Data Management.” I’ll be providing an overview of DataOps—what it is, how it works, and why it matters—and presenting an interesting healthcare case example. (So far, only two slides include pictures of my head.) I’m looking forward to an enlightening discussion! Registration details and more information available here.

dataops_landing_890x200_1

Silos kill! Well, they at least hinder progress. Keep an eye out for my upcoming DataOps report “No More Silos: How DataOps Technologies Overcome Enterprise Data Isolationism.” (Tentative publication date = Friday, April 28, 2017.) The research looks at how data innovators leverage technologies from vendors like Informatica, Domo, Switchboard Software, Microsoft, Yellowfin, and GoodData to break down organizational, architectural, and process-based enterprise silos.

Here’s what the first page might just look like:

p1 - No More Silos

Posted in Blog | Tagged , , , , , , | Leave a comment

This Week in DataOps: The Tradeshow Edition

TWIDO logoDataOps wasn’t the most deafening sound at Strata + Hadoop World San Jose this year, but as data-workflow orchestration models go, the DataOps music gets louder with each event. I’ve written before about Boston-based DataOps startup Composable Analytics. But several Strata startups are starting to get attention too.

Still-in-stealth-mode-but-let’s-get-a-Strata-booth-anyway San Francisco-based startup Nexla is pitching a combined DataOps + machine-learning message. The Nexla platform enables customers to connect, move, transform, secure, and (most significantly) monitor their data streams. Nexla’s mission is to get end users deriving value from data rather than spending time working to access it. (Check out Nexla’s new DataOps industry survey.)

DataKitchen is another DataOps four-year-overnight success. The startup out of Cambridge, Massachusetts also exhibited at Strata. DataKitchen users can create, manage, replicate, and share defined data workflows under the guise of “self-service data orchestration.” The DataKitchen guys—“Head Chef” Christopher Bergh and co-founder Gil Benghiat—wore chef’s outfits and handed out logo’ed wooden mixing spoons. (Because your data workflow is a “recipe.” Get it?)

DataOps at Strata - Nexla and DataKitchen booths

DataOps in the wild — The Nexla and DataKitchen exhibition booths at Strata + Hadoop World San Jose.

Another DataOps-y theme at Strata: “Continuous Analytics.” In most common parlance, the buzzphrase suggests “BI on BI,” enabling data-workflow monitoring/management to tweak and improve, with the implied notion of consumable, always-on, probably-streaming, real-time BI. Israeli startup Iguazio preaches the continuous analytics message (as well as plenty of performance benchmarking) as part of its “Unified Data Platform” offering.

I got the chance to talk DataOps with IBM honchos Madhu Kochar and Pandit Prasad of the IBM Almaden Research Center. Kochar and Prasad are tasked with the small challenge of reinventing how enterprises derive value from their data with analytics. IBM’s recently announced Watson AI partnership with Salesforce Einstein is only the latest salvo in IBM’s efforts to deliver, manage, and shape AI in the enterprise.

Meanwhile, over in the data-prep world, the data wranglers over at Trifacta are working to “fix the data supply chain” with self-service, democratized data access. CEO Adam Wilson preached a message of business value—Trifacta’s platform shift aims to resonate with line-of-business stakeholders, and is music to the ears of a DataOps wonk like me. (And it echoes CTO Joe Hellerstein’s LOB-focused technical story from last fall.)

Many vendors are supplementing evangelism efforts with training outreach programs. DataRobot, for example, has introduced its own DataRobot University. The education initiative is intended both for enterprise training, but also for grassroots marketing, with pilot academic programs already in place at a major American university you’ve heard of but shall remain nameless, as well as the National University of Singapore and several others.

Another common theme: The curse of well-intentioned technology. Informatica’s Murthy Mathiprakasam identifies two potential (and related) data transformation pitfalls: cheap solutions for data lakes that can turn them into high-maintenance, inaccessible data swamps, and self-service solutions that can reinforce data-access bad habits, foster data silos, and limit process repeatability. (In his words, “The fragmented approach is literally creating the data swamp problem.”) Informatica’s approach: unified metadata management and machine-learning capabilities powering an integrated data lake solution. (As with so many fundamentals of data governance, the first challenge is doing the metadata-unifying. The second will be evangelizing it.)

I got the opportunity to meet with Talend customer Beachbody. Beachbody may be best known for producing the “P90” and “Insanity” exercise programs, and continues to certify its broad network of exercise professionals. What’s cool from a DataOps perspective: Beachbody uses Talend to provide transparency, auditability, and control via a visible data workflow from partner to CEO. More importantly, data delivery—at every stage of the data supply chain—is now real time. To get to that, Beachbody moved its information stores to AWS and—working with Talend—built a data lake in the cloud offering self-service capabilities. After a speedy deployment, Beachbody now enjoys faster processing and better job execution using fewer resources.

More Strata quick hits:

  • Qubole is publishing a DataOps e-book with O’Reilly. The case-study focused piece includes use-case examples from the likes of Walmart.
  • Pentaho is committed to getting its machine-learning technology into common use in the data-driven enterprise. What’s cool (to me): the ML orchestration capabilities, Pentaho’s emphasis on a “test-and-tune” deployment model.
  • Attunity offers three products using two verbs and a noun. Its Replicate solution enables real-time data integration/migration, Compose delivers a data-warehouse automation layer, but it is Attunity’s Visibility product that tells the most interesting DataOps story: It provides “BI-on-BI” operations monitoring (focused on data lakes).
  • Check out Striim’s BI-on-BI approach to streaming analytics. It couples data integration with a DataOps-ish operations-monitoring perspective on data consumption. It’s a great way to scale consumption with data volume growth. (The two i’s stand for “Integration” and “Intelligence.” Ah.)
  • Along those same lines, anomaly-detection technology innovator Anodot has grown substantially in the last six months, and promises a new way to monitor line-of-business data. Look for new product, package, and service announcements from Anodot in the next few months.

Last week I attended Domo’s annual customer funfest Domopalooza in Salt Lake City. More on Domo’s announcements coming soon, but a quick summary:

  • Focus was noticeably humble (core product has improved dramatically from four years ago, when it wasn’t so great, admitted CEO Josh James in his first keynote) and business-value-focused. (James: “We don’t talk about optimizing queries. (Puke!) We talk about optimizing your business.”)
  • There was a definite scent of DataOps in the air. CSO Niall Browne presented on Domo data governance. The Domo data governance story emphasizes transparency with control, a message that will be welcomed in IT leadership circles.
  • Domo introduced a new OEMish model called “Domo Everywhere.” It allows partners to develop custom Domo solutions, with three tiers of licensing: white label, embed, and publish.
  • Some cool core enhancements include new alert capabilities, DataOps-oriented data-lineage tracking in Domo Analyzer, and Domo “Mr. Roboto” (yes, that’s what they’re calling it) AI functionality.
  • Domo also introduced its “Business-in-a-Box” package of pre-produced dashboard elements to accelerate enterprise deployment. (One cool dataviz UI element demoed at the show: Sample charts are pre-populated with applicable data, allowing end users to view data in the context of different chart designs.)

Finally, and not at all tradeshow-related, Australian BI leader Yellowfin has just announced its semi-annual upgrade to its namesake BI solution. Yellowfin version “7.3+” comes out in May. (The “+” might be Australian for “.1”.) The news is all about extensibility, with many, many new web connectors. But most interesting (to me at least) is its JSON connector capability that enables users to establish their own data workflows. (Next step, I hope: visual-mapping of that connectivity for top-down workflow orchestration.)

Posted in Blog | Tagged , , , , , , , , , , , , , , , | Leave a comment