This is the first in Blue Hill Research’s occasional blog series “Questioning Authority with Toph Whitmore.”
As co-founder (with friend Michael Stonebraker) of Vertica, Andy Palmer ambitiously sought nothing less than to reinvent the database. In 2013, he and Stonebraker moved up the data value chain and founded Tamr, the Cambridge, MA-based software company aiming to provide a unified view of data in the modern enterprise.
Palmer joined me for a discussion in which he talked Tamr, predicted the future of enterprise data management, and introduced a rather colorful (yet apt) analogy of which, he admits, his marketing team is less than fond.
TOPH WHITMORE: Tell me about the genesis of Tamr. You and Michael Stonebraker founded the company in 2013. What unfulfilled business or technology need did the two of you see?
ANDY PALMER: Mike has spent his career working at the intersection of academic research and database systems, and the commercial application of those ideas. After Vertica, we felt that the enterprise data and analytics bottleneck had moved. It was no longer about the core database systems and their performance. The bottleneck had moved up the stack to where people were trying to do semantic mapping between many different data sources. We saw a pattern. Those enterprise data sources should be managed and organized similarly to how search engines crawl and organize the modern world-wide web.
Early on, Yahoo had a taxonomical approach to organizing the web. Yahoo had lists—”These are travel sites, these are financial services sites,” and they had thousands of library scientists employed to produce them. Google’s automated model quickly surpassed this manual, top-down approach.
We see this same pattern in big companies today. The fundamental data asset isn’t websites, it’s tabular data in transactional systems. And the Google-like approach to creating models where you crawl all these different tabular data sets, organize them, and integrate human feedback for tuning has the potential to transform the way the average corporate data consumer gets access to more and complete data.
[Mike and I recognized] that the siloed nature of enterprise data was extreme. That most people in a company didn’t realize how much data actually existed. If they did grok the thousands of data sources, they couldn’t possibly manage them with their existing toolsets. There was this discontinuous technology improvement that was required. That’s really what really inspired us to start Tamr.
TW: Who did you see as your customers?
AP: When you do something like this, you have to decide whether you’re going to sell something that’s horizontal (and boil the ocean), or whether you’re going to sell something tied to a specific value proposition. Tamr’s the latter, not the former.
For [Tamr customer] GE, it was a simple analytic question: Are we getting the best terms every time we buy something across GE worldwide? It’s a simple question, but a compelling one, and it was hard to answer without a unified view across GE’s many thousands of different procurement systems and hundreds of ERP systems. So, this little bit of new Tamr technology combined with all of this broad data resulted in GE saving hundreds of millions of dollars.
TW: With GE, why did you start with procurement analytics?
AP: GE identified that challenge early on, and we saw that it was a great application [for Tamr]. In Tamr’s first two years, we identified more than 90 different use cases. We’ve landed on three primary use cases that are at the core of what Tamr does today. The first is procurement and spend optimization. The second is customer data integration. And the third is life sciences—data unification for large pharmaceutical companies.
The time has come for big companies to start harvesting the value—monetizing the value of all their data. They’ve invested billions of dollars automating business processes, and have created a lot of silos. But when you start to run analytics across all of the data that lives in those silos, there’s low-hanging fruit!
TW: You mentioned data monetization. How are Tamr customers using your technology to create money-making data products, services, or solutions?
AP: Thomson Reuters is one of our largest customers. Data monetization is what Thomson Reuters does. We see an amazing overlap between the traditional information services providers—whether it’s Thomson Reuters, or Acxiom, or Experian—and the modern enterprise. Inside a big company, once you do a good job of managing data as an asset, you essentially become a data broker. The forward-thinking enterprise CIOs and Chief Data Officers are positioning themselves as the data brokers across business units and between corporate silos. From brokering that data internally to business units that want to run analytics, it’s a logical extension to monetize data by selling it outside the company. Often, the best path for a company that’s not yet in the business of reselling information is to develop the skills and the muscles to reuse information internally in a compelling way, then look to monetize those things on the outside. Many companies have this potential—if they’re forward-thinking and proactive enough.
TW: Is there a risk of data-integration technologies reaching a saturation tipping point? Once an enterprise customer achieves data-integration efficiencies, where does that customer go from there?
AP: We’re in the early stages of a large change in the way big companies compete. Some call it digital transformation. It’s characterized by an aspiration to compete on analytics. Logically, these companies embrace being data-driven as a core part of that competitive dynamic. But for these companies, the initial bottleneck in being data-driven, in competing on analytics, is that their data is extremely siloed and often really dirty. Being a data consumer in a large business today is a lot like drinking water in Flint, Michigan! You never really know if your data is clean or safe, but you have to trust that someone is doing the right thing to make sure that you don’t get sick.
Once you realize what it’s like to have clean data—this is what happened at GE—you want more. After we cleaned up GE’s supplier and procurement data, GE asked: “Can you do this with our customer data?” Tamr’s like a water treatment plant for your data!
TW: I like that! I might not lead with it in your marketing…
AP: Yeah, the marketing guys hate it when I use that analogy!
The next generation of data management in the enterprise will be characterized by a collection of relatively open tools that are best of breed. It’ll be more like DevOps than like traditional data management. In DevOps, you’ve got GitHub, Subversion, Jira. These tools are relatively interoperable, relatively inexpensive to adopt, and have an ecosystem of vendors to support you when you use them at scale.
I like to call the new enterprise data ecosystem “DataOps.” In the data-prep space, we’re seeing the separation between self-service data-prep tools like Alteryx, Trifacta, and Paxata—they’re like the last mile, like the plumbing inside your house. Then you have automated data-prep tools like Tamr which are more like the water treatment plant. We’re taking all of the source data, organizing it, cleaning it, and giving you this lowest common denominator and saying “Hey, this is safe to drink.”
It’s natural for data that comes from this unified view that Tamr provides to feed into tools like Trifacta and Paxata and Alteryx. But we’re in early days of this new dynamic. It may take decades to sort all this out. There’re huge dollars at the end of that rainbow, but some of these big companies move slowly.
TW: If I’m an enterprise customer, what criteria should I take into account as I look to adopt data integration, machine-learning, data self-service technologies?
AP: There’s noise out there. And you can’t believe everything you hear. Start with understanding your most important analytical questions and use cases. What are the analytics that are truly going to drive significant new value in your company? I put those into two buckets. They’re analytics that are focused on helping you save money, or they’re analytics that are focused on helping you sell more, much faster.
Once you define those questions, and you understand the workloads associated with answering them, then you match those workloads with best-of-breed engines and technologies to support them. And then, define how those engines and technologies are going to work together. Make sure they’re actually designed to be interoperable! Back to the “DevOps” reference, if I decide to use Subversion for source-code management instead of GitHub, I can do that and still use Jira. With DataOps tools and technologies, it needs to be a similar thing.
You need to think about those workloads and the tool integration in context of the full lifecycle of data, from where the data is created in its sources, all the way through to where it is consumed in analytics. Implement a small number of these core technologies with an eye towards scaling out on commodity infrastructure—ideally, it’s hosted, multi-tenant, cloud infrastructure—then you’ve got something good you can work with! All those decisions—which vendors, what tools—have been made in context of these analytics use cases.
This approach is sort of antithetical to what a lot of companies have been doing. They say, “I’m going to go build a data lake!” And I ask “For what? For whom?” Now, there are some benefits to their approach: They buy a distributed cluster, they figure out how to use these newer tools, they modernize their people’s skillsets. But without analytical context, it’s just technology for technology’s sake, and it’s really not worth it.
TW: What’s next for Tamr?
AP: For Tamr, customer success is the best marketing we can get. We’re not doing as many tradeshows as we did two or three years ago. We’re focused on making our customers successful and happy. That’s our top priority.
TW: You’ve started five companies with Mike [Stonebraker]. What’s next for you?
AP: This is it for me! I’ve spent my career building companies, and Tamr’s my last project. We’re committed to building a strong and independent company here at Tamr. That may take ten, fifteen years! We’re all about the long term…our relationships with customers are oriented toward creating long-term value.
Last year, Tamr won “Best Place to Work in Boston,” and we’re really proud of our culture here that attracts and rewards the best people who want to work hard and have fun at work every day. This is it for me.