Note: This blog is the fourth in a monthly co-authored series written by Charlotte O’Donnelly, Research Associate at Blue Hill Research, and Matt Louden, Brand Journalist at MOBI. MOBI is a mobility management platform that enables enterprises to centralize, comprehend, and control their device ecosystems.
Transforming operations and ushering in a new age of security concerns and protocols that businesses will face going forward, IoT converts each business access point into a new potential data source, generating feedback that changes on a per-second basis. As such, the volume and granularity of this data makes it a highly valuable resource to enterprises and a clear target for nefarious activity. Unlike enterprise security of the past, IoT device and network security must keep pace with the rate of real-time data and thousands or millions of new enterprise access points that can potentially be compromised.
All it takes is a look at recent headlines about breaches at companies like Yahoo! and Target to realize business and consumer data is no longer safe from prying eyes, especially now that it’s largely stored and transmitted through the cloud. Security breaches aren’t just becoming more prevalent; their impact is becoming more serious. A major security breach could put a company out of business or destroy its brand reputation if customers, vendors, and partners lose trust in the organization’s ability to securely operate.
The threat of sensitive business information on unsecured mobile devices or wireless networks became a concern with the advent of enterprise mobility and machine-to-machine (M2M). With machine-to-machine (M2M) technologies, one machine communicates with another across an internal network via embedded hardware modules, making data much more localized. In the wider networks of IoT, this threat becomes even more pronounced as IoT data is shared with internal networks, in the cloud, and on devices. The sheer reach and volume of data generated makes IoT security an unprecedented challenge for businesses.
IoT devices also suffer from a lack of industry-wide security standards. In enterprise mobile technology, security largely takes place at the component level: manufacturer security at the device level, enterprise security at the software layer, and network security in the cloud. When IoT software is overlaid onto built-in device security, the same basic device now has two very different and distinct security profiles. That makes it challenging for enterprises to manage all program access points: the device, network, and data.
IoT device manufacturers need to work together to develop secure, universal architecture and code management standards. Unfortunately, this level of “coopetition” is a long way away. For now, enterprises are left to develop their own security standards, causing the number of data breaches to grow as companies navigate this new world of device security.
As IoT often involves investing to make currently owned devices and equipment smarter, eliciting the behavior change required to provide adequate security for IoT devices can be a challenge for organizations. For mobile devices, many companies address security and management challenges by working with a third-party Enterprise Mobility Management (EMM) vendor whose software provisions secure and standardized protocols to entire device inventories. By outsourcing these tasks, enterprises gain best-in-class solutions without incurring significant overhead or tying up scarce IT resources. Much as they did with mobile devices in the past, today’s EMM vendors are increasingly incorporating IoT devices into their platforms and building out industry best practices for this new technology.
In the future, organizations will incorporate all IT assets (mobility, M2M, cloud, IoT, and traditional legacy infrastructure) into a single management platform —in many cases through a third-party relationship. Like mobile device security, IoT will largely be driven by outside partners that have experience incorporating IoT devices into enterprise device management portfolios and security protocols.
To successfully accomplish this, IT will need to involve virtually every organizational decision-maker within telecom, procurement, and purchasing departments. An enterprise’s IT asset buyers have not traditionally been the same people setting up carrier accounts or paying the bills. By bringing together different departments, businesses can get closer to creating IoT standards that minimize the risk of security breaches and allow businesses to better compete in this new era: the Internet of Everything.
As CEO of Trifacta, Adam Wilson is committed to developing the best in data-wrangling technology, and then of course, preaching its gospel. He and I spoke recently about Trifacta’s past, present, and future (“groups and loops”), partnerships with companies you might have heard of, and how the enterprise data landscape is evolving (for the better).
TOPH WHITMORE: Tell me about Trifacta’s backstory. Where did it all begin?
ADAM WILSON: Trifacta was born of a joint research project between the University of California, Berkeley and Stanford. There was a distributed-computing professor at Cal that had been doing work in this area [data wrangling] for almost a decade, looking at the intersection of people, data, and computation. He got together with a human-computer interaction professor from Stanford who was trying to solve the complex problem of transformation and preparing data for analysis.
And they were joined by a Stanford PhD. student who had worked as a data scientist at Citadel on trading platform algorithms. He found he spent the majority of his time pushing data together, cleansing it, and refining it, as opposed to actually working on algorithms. He returned to Stanford to work with these professors to figure out how to eliminate the 80% of the pain that exists in these analytics problems by automating the coding or tooling, and making it more self service. The three of them worked together, and created a prototype called the Stanford Data Wrangler. Within six months, 30,000 people were using it, and they realized they had more than an academic research project. So they created a commercial entity and started delivery to customers like Pepsi, Pfizer, GoPro, RBS.
I joined two-and-a-half years ago to help with go-to-market. At the time, the question was how do we help people take data from raw to refined, get productive with that information quickly, and do so in a self-service manner? We focused on customer acquisition, and I’m pleased to say we now have more than 7000 companies using Trifacta technology. And customer use of Trifacta data-wrangling technology creates training data that improves our machine learning.
TW: How does machine learning show up in Trifacta? And what drove your investment in it?
AW: Historically, machine learning has been the exclusive purview of only the highly technical. But machine learning and artificial intelligence have been part of Trifacta since the beginning. There are two fundamental observations. First, every data set is not a new data set. There are things we can infer from the data itself. Whether it’s inferring data types or inferring joins, we can provide automated structuring in a straightforward manner.
Second, we learn from user behavior. As users interact with data, we can make recommendations based on that behavior. Based on our own analysis, we can recognize they are dealing with a specific kind of data and interacting with it in a particular way, and we can make a suggestion. They can choose that suggestion and get immediate feedback as to what the data would look like if they apply those suggested rules. That cuts down on iteration. The end users can make a quick decision, see what it looks like, and if they don’t like it, make a different decision. Over time, they build up intelligence that encapsulates all the rules they are applying to the data. And that becomes something they can share, reuse, and recycle.
It’s not just about individual productivity in getting to refined data. It’s about how end users can collectively leverage that across teams or an enterprise to help curate data at scale.
TW: The business value of the machine learning you’re describing…does that take Trifacta into sales conversations with business stakeholders? Or do you evangelize primarily to an IT operations audience?
AW: The winners in this market are going to be those who recognize that collaboration between those two enterprise roles is absolutely essential. In the past, you’ve seen people building technical tools for IT organizations, and who have lost track of who the end consumer is, and have not provided self service. Or, on the flip side, you’ve seen BI technologies that embed lightweight data tools, but in the end lose track of the fact that IT needs to be able to govern that information, curate that information, secure it, and ensure it’s leveraged across the organization.
From the beginning, Trifacta has been a strong advocate of a vendor-neutral data-wrangling layer that allows you to wrangle data from everything, and in many regards, allows people to change their minds. You may be using any storage or data-visualization technology, but you don’t’ want to feel locked into any one decision that you’re making. You always want to be able to transform your data so that it’s useful, regardless of where you might be storing or processing it, or how you might be visualizing it now. Wrangle once, use everywhere.
We have a large financial services customer that uses 136 different BI-reporting solutions. The idea that they can wrangle that data in 136 different ways with 136 different tools was surprising for them. We provide a linear way to wrangle that information, refine it, then publish it out through a number of different channels, all with a high degree of confidence that it’s correct, and with appropriate lineage and metadata tracking how the source data has changed.
TW: Trifacta has pursued a proactive alliance strategy. Tell me about the partnership with Alation. How do the two technologies complement each other?
AW: I’m excited about the partnership with Alation! We have joint customers together with Munich RE, Marketshare, BNSF, and a number of companies looking to combine cataloging with wrangling. The idea is, when the data gets integrated into the large-scale data lakes, the first step is let me inventory it, then let me create an enterprise data dictionary that makes discovery and finding assets easier. Then, let me refine that data, enrich it, and transform it into something that will drive my downstream analysis. It starts with getting that data-lake infrastructure in place, then bringing in the tooling to allow end users to make productive use of the data that’s in the data lake.
Our customers use many different BI and visualization tools like Qlik, MicroStrategy, or Tableau, and sometimes modeling or predictive analytics environments like DataRobot. The front-end technologies serve different types of data consumption, but the cataloging combined with the wrangling is complementary, and ensures you can operationalize your data lake and expose it to a broad set of users.
TW: You’ve also recently partnered with a little startup called Google. Tell me about that partnership, what it means to Trifacta, what it means to your customers?
AW: Our vision for the space has always been self service. That approach helps alleviate infrastructure friction. Any time we can help people get wrangling faster and spending more time with the data as opposed to configuring infrastructure, that’s a win. About a year ago, Google took a look at this market and recognized that—as more data lands on the Google Cloud Platform, and in particular, cloud storage—Google needed a way to help those customers get that data into BigQuery, and to leverage it with technology like TensorFlow that would help those customers accelerate the process of seeing value from the data in those environments.
Google did an exhaustive search, and they selected Trifacta as the Google data-preparation solution. We worked with Google to ensure scalability, and that included integrating with Google Dataflow, and authentication, and security infrastructure. Google will take us to market as “Google Cloud Dataprep,” under the Google brand, and sell it alongside and in combination with new Google cloud services. To my knowledge, it’s the first time that Google has OEM’ed a third-party technology as part of the Google Cloud Platform.
TW: I have to ask—since I’m speaking with the CEO—will Google buy Trifacta?
AW: A lot of the value in a solution like Trifacta is being the decoder ring for data. Our independence is an important part of where the value is in the company. The fact that Trifacta can gracefully interoperate with on-prem systems and cloud environments was important to Google in making the decision to standardize on Trifacta. There’s value in our independence, so for us, the exciting thing is not only having the Google seal of approval, but delivering a multitude of hybrid use cases. HSBC is a joint customer, and uses Google for risk and compliance management and financial reporting. Trifacta data-wrangling has become a critical capability for HSBC to leverage, particularly with regard to data governance. Regulations change, keeping up with them is a huge burden, but Trifacta gives HSBC the flexibility to wrangle its data—on-prem or in the cloud—and create value in that evolving regulatory environment.
I sometimes get asked about what the Google partnership means for exclusivity—Will Trifacta still work with AWS, and Microsoft Azure, and others? The answer is absolutely yes. We’ve had a leading cloud vendor really shape our cloud capabilities, and accelerate our cloud roadmap. But we’ve made sure that everything we’ve done can be leveraged elsewhere, in other cloud environments. It’s not just a hybrid world between cloud and on-prem, it’s a multi-cloud world. That was important to Google. Google has multi-cloud customers, and they need to be able to wrangle data in those environments as well.
TW: Very diplomatic answer! Where to next for Trifacta?
AW: Three things. The first two are “groups and loops.” We put effort into self service, governance, machine learning. Now we want to apply this to provide fundamentally better solutions for teams to work together, to collaborate more efficiently. We’ve only just scratched the surface, and in the next twelve months you’ll see innovation from Trifacta in what it means to collaboratively curate information, and then learn from collective intelligence. How do we crowd-source that curation? How do we share collective intelligence most efficiently? And how do you get organizational leverage across it?
As for “loops,” we’re looking at how we ensure that this collective intelligence can be reused, and operationalized to scale with ever-increasing efficiency. We see a tool-chain of data tools to be crafted that will essentially become the work bench for how modern knowledge-workers get productive and collaborate.
Third, Trifacta is looking at how we can embrace real-time data streaming, as more and more of the data is streamed into these environments.
Join me and Information Builders VP of Marketing Jake Freivald Thursday, April 27 2017 for our webinar on “DataOps in the Real World: How Innovators are Reinventing Their Business Models with Smarter Data Management.” I’ll be providing an overview of DataOps—what it is, how it works, and why it matters—and presenting an interesting healthcare case example. (So far, only two slides include pictures of my head.) I’m looking forward to an enlightening discussion! Registration details and more information available here.
Silos kill! Well, they at least hinder progress. Keep an eye out for my upcoming DataOps report “No More Silos: How DataOps Technologies Overcome Enterprise Data Isolationism.” (Tentative publication date = Friday, April 28, 2017.) The research looks at how data innovators leverage technologies from vendors like Informatica, Domo, Switchboard Software, Microsoft, Yellowfin, and GoodData to break down organizational, architectural, and process-based enterprise silos.
Here’s what the first page might just look like:
Legacy is a perception of investment, and of value. Unfortunately, legacy in the digital transformation era is seen to be a re-investment is what has been, but not what will necessarily be useful going forward. For me, this is a false statement. For example, when the Year 2000 issue happened with systems, some firms used that opportunity to build more functionality into their systems where others just fixed the necessary bugs for the changeover. So, one person’s legacy situation is perhaps another person’s opportunity.
But as the volume of legacy in an enterprise grows, how have we grown in our ability to leverage the investment in this legacy — or, for that matter, is it still worth the effort? Do legacy applications house a hoard of useful information and behavior — or is it a ball and chain, something you should reduce if you want to be innovative and actively working on transformation?
Legacy constraints often seem immense and burdensome — but, do they always need to be? Is object-oriented legacy software spaghetti code — or is it more like ravioli? Do agile methods embrace or reject the use of the legacy? I am writing a series of blog posts on legacy and innovation, disproving the myth that old equals out of date and useless.
In this blog post, I will look at legacy in regards to security and streamlining of security operations. The shift to cloud and mobile has not always been graceful for organizations and has been disruptive to the way we deploy security controls. Making significant changes in authentication flow, the one security control that gates all vital access and privilege, is an enormously arduous and fragile task. The modern ‘mobile-first’ access pattern has thrown a wrench into what was an otherwise easy manageability for account security.
Not only are modern security controls challenging to adapt and apply to legacy infrastructure and interfaces, but legacy security controls tend to fall flat when it comes to modern infrastructure. How do you deploy your legacy security controls in the world of cloud and mobile when you don’t control the endpoint, network, application or infrastructure?
Authentication is often the only effective security control you have left in a modern, cloud and mobile-enabled IT environment. So you better be damn sure that authentication control is more than a simple password. But many do not. Why is this?
I have done several authentication projects recently, and one of the main challenges I have seen is a lack of understanding of what must be protected and by whom. Too often, the focus is on cost and procedure, and not on an understanding of the dataflow and the number of endpoints involved in protecting the data. So why does the means to modern authentication seems difficult and expensive, and why do we worry so much about the impact on user experience when we never did in legacy? (wry smile). Let’s look at why 2FA, SSO and biometrics never have caught on with many legacy houses, and why some still stick with passwords 10 years after many predicted their demise.
Two-factor authentication is becoming the norm for password security in what amounts to a reasonable concession from users to IT staff pleading with them to follow basic password security protocols. Since almost no one follows those protocols, two-factor authentication has become the stop-gap. Although passwords are bad, biometrics and other mechanisms were never considered a good replacement because they all suffered their own flaws, and could not counteract the biggest advantage passwords have going for them: They are cheap and convenient. Today we are seeing a growing movement away from explicit, one-point-in-time authentication to a recognition model that mixes implicit factors — such as geolocation, device recognition and behavioral analytics — with explicit challenges such as passwords, biometrics, OTPs [one-time passwords] and dynamic KBA [knowledge-based authentication] based on identity verification services. I just borrowed a colleague’s login to use an online application, and was denied based on geolocation and was asked for verification code from his email. Given he is (hopefully) asleep in Canada and I am in Belgium, this stopped my progress to use the app.
Given we are throwing mobile into the mix, many firms are starting to use mobile push assuming we are glued to our mobile devices (at least the folks under 30) and can use it as an authenticator. Mobile OTP and mobile device authenticators add some value in a 2FA approach, assuming you have not lost the device and/or are out of battery. But for security, do remember that a smartphone can still receive and display social media or text message alerts even when the device’s screen is locked and the application that is pushing the notification is closed.
Basically, the security measures we use today reflect our risk tolerance and desire for simplicity. This is because we assumed the hardware and systems were defended, and the endpoints were irrelevant because of strong system security. Appropriate security depends on how valuable your data in the transaction is and what other protection is available for the data (encryption, public key infrastructure, etc). Legacy complexity can be a good thing if the data is valuable. But we work the data now at the endpoints, and therefore we need to find a way to block endpoint activities if necessary, using legacy technology.
We start with a basic fog question: When and where should we use fog computing in our network?
The basic premise of fog computing is decentralization of data processing as some processing and storage functions are better performed locally instead of sending data all the way from the sensor to the cloud and back again to a control mechanism. This reduces latency and can improve response times for critical system functions, saving money and time. Fog computing also strives to enable local resource pooling to make the most of what’s available at a given location.
I believe the opportunity for this kind of distributed intelligence and the associated intelligent gateways needed for fog computing are the strongest when these two conditions are met:
1. The focus of data analytics is at the aggregation level so the closer the better; and
2. There is a complex degree of protocol complexity where doing it locally actually makes more sense.
Markets that have these needs include manufacturing, extraction industries (energy, as an example), and healthcare. Applications such as smart metering can benefit from real-time analytics of aggregated data that can optimize the usage of resources such as electricity, gas, and water. Local level analytics is suited for those applications that require the data to be stored and analyzed locally due to either regulatory reasons or because the cost of transportation of the data upstream and the associated wait-time for analysis is prohibitive, such as airline maintenance data.
One major network bandwidth issue for IoT in the coming years is subsidiarity, making sure that the data analysis is done at the appropriate level to the speed and efficiency demanded for the application. In most cases, there will be a blend of approaches and the functionality to manage local as well as central application management will be increasingly critical to data analysis speed and functionality.
Use cases for fog computing and IoT
Good use cases for fog computing will be ones that require intelligence near the edge where ultra-low latency is critical. Some good case examples of fog computing usage in energy can be found in both home energy management (HEM) and microgrid-level energy management. HEM can use IoT to transform an existing home to a smart home by integrating various functionalities such as: temperature control, efficient lighting, and management of smart devices. A microgrid is a smart distribution device that can connect and disconnect from the grid to enable it to operate in both grid-connected or standalone mode.
My own personal interest is in connected building and smarter rooms in office buildings. Here there is a demonstrated need for edge intelligence and localized processing. A commercial building may contain thousands of sensors to measure various building operating parameters: temperature, keycard readers and parking space occupancy. Data from these sensors must be analyzed to see if actions are needed, such as triggering a fire alarm if smoke is sensed. Fog computing allows for autonomous local operations for optimized control function. This is useful for building automation, smarter cities, smarter hotels and more automated offices.
One good example of an architecture that has taken this into account can be found here with the Flextronics’ Smart Automation project. Another good example can be seen here in the Raiffeisenbank Romania headquarters, with redundant control systems for maximum reliability.
To conclude, there is a whole sub-layer of functionality where fog computing can quickly and autonomically assess control and develop edge intelligence within the enterprise. Our Industry 4.0 research continues to examine edge intelligence activities where central computing resources can still retain a viable role in the enterprise.
First came ThingWorx, then came Kinex, and the parallels are similar to IBM’s MobileFirst for iOS application development platform (though serving a different enterprise function). PTC is the next Industrial IoT behemoth to recognize that successful IIoT deployments require supporting applications that bring together a range of product and operations data in a single source.
On April 6, PTC announced the launch of Kinex, a suite of role-based, Industrial IoT (IIoT) applications built on the ThingWorx platform. By offering both branded and business-specific applications, the platform is similar to IBM’s MobileFirst for iOS (the success of which bodes well for PTC). PTC now provides both the IoT connectivity layer, with ThingWorx, and the application layer, with Kinex, and its expertise in Industrial IoT can aid companies in more quickly bringing application-supported IIoT innovations to market.
Kinex applications are designed to bring together data from enterprise systems and physical sensors to draw insight from and change the process of how IIoT products are designed, manufactured, serviced, and used by companies. PTC’s first branded Kinex application is Kinex Navigate, which has seen relatively quick adoption with over 125,000 seats sold. Kinex Navigate allows anyone within an organization to access up-to-date product lifecycle data pulled from multiple systems of record. Along with PTC’s Windchill system for product lifecycle management (PLM), the application enables universal data access and timely product data to drive better product decisions. PTC plans to release additional Kinex apps in the future that will allow enterprises to build on ThingWorx development capabilities to add business-specific, custom functionality.
By introducing the Kinex suite to develop Industrial IoT applications built on the ThingWorx platform, PTC aids enterprise customers in going to market quickly with new IIoT solutions and services. Customers can choose either branded apps such as Kinex Navigate, or create custom business functionality by building on top of the ThingWorx platform and Kinex applications.
How Does Kinex Hark Back to IBM MobileFirst for iOS?
This move is similar to IBM’s success in partnering with Apple to create a family of applications through the MobileFirst for iOS app development and mobile management platform. MobileFirst for iOS offers a suite of industry use case applications, including pre-built apps based on industry templates or fully customized apps. The goal of IBM and Apple’s MobileFirst for iOS partnership was to change the way employees work by integrating mobile-based process changes on the front end with IBM’s cognitive analytics on the back end.
MobileFirst for iOS is a highly successful partnership that leveraged IBM’s core capabilities in cognitive analytics. This success should bode well for PTC given its expertise in IIoT. While IBM chose to partner for the application development portion, PTC achieves greater control over Kinex applications by building them on top of the ThingWorx platform. MobileFirst for iOS expanded IBM’s presence in the enterprise by supporting custom enterprise apps with business-specific functionality. PTC aims to do the same in Industrial IoT by offering both branded and custom applications that leverage PTC’s strengths: IoT equipment and connectivity, as well as product lifecycle management (PLM) and data management systems.
Kinex is a smart move for PTC. By controlling both the IoT platform and the application, PTC gains a broader footprint in IIoT, something a number of large, global IoT players are racing to accomplish. Nearly a year out from writing about MobileFirst for iOS, I can see that the step was strategic for IBM in combining cloud, mobile, and analytics into enterprise-grade iOS apps, and thus expanding IBM’s (and its cognitive solutions’) reach in the enterprise. I expect I’ll see a similar result from PTC’s investments in IIoT with the Kinex application suite.
On April 4, 4TelecomHelp announced an all-in-one SaaS platform for TEM and WEM called 4-Titan, developed through a partnership with Juvo Technologies. The platform is built for end-to-end telecom and mobility management with a ‘Four Cornerstone’ approach that ties in inventory, contracts, operations, and expenses. 4TelecomHelp has developed and supported a number of standalone platforms over the past decade but will now be able to offer a single, integrated platform to its users. Key takeaway? This comprehensive, centralized approach is well suited to new management categories such as cloud and software licenses, and IoT devices, machinery, and sensors that TEM companies are increasingly being asked to manage. Additionally, the single platform SaaS offering will enable 4TelecomHelp to sell into larger enterprise accounts than the company has typically targeted, as mid-to-large sized companies most often favor an all-in-one managed services approach for telecom and mobility.
In our December 2016 Mid-Market TEM Landscape, Blue Hill noted that 4TelecomHelp primarily targets companies with around $500,000 per month in telecom spend, but has some accounts with as low as $100,000 in monthly spend, and a few larger and Fortune 500 clients as well. Mobile makes up 25-30% of 4TelecomHelp’s business. 4TelecomHelp does not often go head to head against other TEM companies, but will most often compete directly against telecom consulting companies due to its focus on custom engagements and project-based work. With a more comprehensive, single-platform offering through its partnership with Juvo, 4TelecomHelp will be poised to sell to more mid-to-large enterprises, as well as increase its share in mobility.
The 4-Titan platform is aimed at addressing not only current telecom and mobility needs such as Bring Your Own Device (BYOD) but also future-facing IT management such as for Internet of Things (IoT) connected devices, machinery, and sensors. For TEM vendors to successfully manage new IT categories such as IoT, and cloud and software licenses, they will need to support a platform that brings contracts, invoices, inventory, and usage data together, as 4-Titan is positioned to do. Looking forward, managing and optimizing not only telecom and mobility but also sensors, connected devices and equipment, and cloud and software licenses is where the TEM industry is headed – or at least, in my opinion, where it needs to head.
Also interesting to note is that Juvo and 4TelecomHelp met through TEMIA, the Technology Expense Management Industry Association. A few weeks back, I was in New Orleans at the semi-annual TEMIA meeting along with nearly 40 companies in the TEM and Managed Mobility Services spaces. Part of the conversation at the meeting was centered around how the term Telecom Expense Management is becoming outdated and no longer represents where the industry is headed – or, for some players, where it currently stands. To note this, TEMIA changed its name from Telecom to Technology Expense Management as TEM vendors began supporting a broader range of IT technologies, and companies focused exclusively on mobility began entering the TEM space.
I’m impressed to see a mid-market TEM vendor begin making investments to future-proof its platform for emerging technology categories such as IoT. While large, global TEM vendors are more frequently highlighting their ability to support new IT categories such as cloud and IoT, the trend for TEM vendors to manage additional IT assets and spend is clearly present in the mid-market as well. Based on the conversations I’ve had with TEM vendors and clients through my work with Blue Hill, I’d advise that mid-market TEM vendors begin investing to support new enterprise technologies and IT assets within their platforms in order to remain competitive not only with global TEM vendors but also with smaller, mid-sized, and regional players as well.
DataOps wasn’t the most deafening sound at Strata + Hadoop World San Jose this year, but as data-workflow orchestration models go, the DataOps music gets louder with each event. I’ve written before about Boston-based DataOps startup Composable Analytics. But several Strata startups are starting to get attention too.
Still-in-stealth-mode-but-let’s-get-a-Strata-booth-anyway San Francisco-based startup Nexla is pitching a combined DataOps + machine-learning message. The Nexla platform enables customers to connect, move, transform, secure, and (most significantly) monitor their data streams. Nexla’s mission is to get end users deriving value from data rather than spending time working to access it. (Check out Nexla’s new DataOps industry survey.)
DataKitchen is another DataOps four-year-overnight success. The startup out of Cambridge, Massachusetts also exhibited at Strata. DataKitchen users can create, manage, replicate, and share defined data workflows under the guise of “self-service data orchestration.” The DataKitchen guys—“Head Chef” Christopher Bergh and co-founder Gil Benghiat—wore chef’s outfits and handed out logo’ed wooden mixing spoons. (Because your data workflow is a “recipe.” Get it?)
Another DataOps-y theme at Strata: “Continuous Analytics.” In most common parlance, the buzzphrase suggests “BI on BI,” enabling data-workflow monitoring/management to tweak and improve, with the implied notion of consumable, always-on, probably-streaming, real-time BI. Israeli startup Iguazio preaches the continuous analytics message (as well as plenty of performance benchmarking) as part of its “Unified Data Platform” offering.
I got the chance to talk DataOps with IBM honchos Madhu Kochar and Pandit Prasad of the IBM Almaden Research Center. Kochar and Prasad are tasked with the small challenge of reinventing how enterprises derive value from their data with analytics. IBM’s recently announced Watson AI partnership with Salesforce Einstein is only the latest salvo in IBM’s efforts to deliver, manage, and shape AI in the enterprise.
Meanwhile, over in the data-prep world, the data wranglers over at Trifacta are working to “fix the data supply chain” with self-service, democratized data access. CEO Adam Wilson preached a message of business value—Trifacta’s platform shift aims to resonate with line-of-business stakeholders, and is music to the ears of a DataOps wonk like me. (And it echoes CTO Joe Hellerstein’s LOB-focused technical story from last fall.)
Many vendors are supplementing evangelism efforts with training outreach programs. DataRobot, for example, has introduced its own DataRobot University. The education initiative is intended both for enterprise training, but also for grassroots marketing, with pilot academic programs already in place at a major American university you’ve heard of but shall remain nameless, as well as the National University of Singapore and several others.
Another common theme: The curse of well-intentioned technology. Informatica’s Murthy Mathiprakasam identifies two potential (and related) data transformation pitfalls: cheap solutions for data lakes that can turn them into high-maintenance, inaccessible data swamps, and self-service solutions that can reinforce data-access bad habits, foster data silos, and limit process repeatability. (In his words, “The fragmented approach is literally creating the data swamp problem.”) Informatica’s approach: unified metadata management and machine-learning capabilities powering an integrated data lake solution. (As with so many fundamentals of data governance, the first challenge is doing the metadata-unifying. The second will be evangelizing it.)
I got the opportunity to meet with Talend customer Beachbody. Beachbody may be best known for producing the “P90” and “Insanity” exercise programs, and continues to certify its broad network of exercise professionals. What’s cool from a DataOps perspective: Beachbody uses Talend to provide transparency, auditability, and control via a visible data workflow from partner to CEO. More importantly, data delivery—at every stage of the data supply chain—is now real time. To get to that, Beachbody moved its information stores to AWS and—working with Talend—built a data lake in the cloud offering self-service capabilities. After a speedy deployment, Beachbody now enjoys faster processing and better job execution using fewer resources.
More Strata quick hits:
Finally, and not at all tradeshow-related, Australian BI leader Yellowfin has just announced its semi-annual upgrade to its namesake BI solution. Yellowfin version “7.3+” comes out in May. (The “+” might be Australian for “.1”.) The news is all about extensibility, with many, many new web connectors. But most interesting (to me at least) is its JSON connector capability that enables users to establish their own data workflows. (Next step, I hope: visual-mapping of that connectivity for top-down workflow orchestration.)
Note: This blog is the third in a monthly co-authored series written by Charlotte O’Donnelly, Research Associate at Blue Hill Research, and Matt Louden, Brand Journalist at MOBI. MOBI is a mobility management platform that enables enterprises to centralize, comprehend, and control their device ecosystems.
Capable software is a powerful competitive business advantage. Without an easy-to-use interface, however, it often fails to make the lasting impact your Information Technology (IT) department expects. Whether your enterprise is designing a User Interface (UI) for the first time or making changes to a preexisting one, be sure to keep these four tips in mind:
Do Your Research
More than anything else, organizations make the mistake of implementing changes and new UI features based solely on what users want. While the intent is admirable, it’s important to remember that a product’s audience brings suggestions to the table, not solutions. User requests can be unreasonable or downright impossible to implement if they fail to understand the scope of work or technology required.
However, that doesn’t mean user feedback should be completely ignored. When properly vetted, it can be a valuable research tool. ESPN.com, for example, increased its overall revenue by 35% after selectively incorporating visitor suggestions into its website redesign.
The first step for any UI project should be conducting thorough, fact-based product management and user experience research. This uncovers the most critical user needs and gives an enterprise definitive rationale for any changes and/or feature additions to be made. Resulting from careful research at this stage, Bing.com generated an additional $80 million in annual revenue by selecting a specific shade of blue for its UI.
After initial research is conducted, protocol-based interviews, paper prototyping, and UI testing can help resolve issues before a new product release even takes place. Development team involvement in these tasks provides additional benefits, as any relevant findings and ideas are properly translated and incorporated into UI design as early as possible. In late-stage user testing, noting any common areas of confusion also ensures the effectiveness of future training efforts.
Focus on Form and Function
UI design involves two separate aspects: interface and workflows. It’s important for an enterprise to anticipate and understand how users will react to changes in both components. In today’s constantly connected digital landscape, full functionality needs to be optimized across all platforms, not just traditional desktop environments. In fact, 83% of users say a seamless experience across platforms is either somewhat or very important to UI design.
While interface changes are immediately visible and create instant, emotional reactions, workflow differences take longer for users to notice and evaluate. In both cases, be sure to sift through initial concerns for any lasting impact that could remain after adjustments are made.
Leaving project calendars clear for at least a few weeks after significant design changes are made prioritizes a product’s user experience and ensures issues can be fixed when they inevitably arise. After all, 52% of users are less likely to engage with a company after a poor user experience.
Fortune favors the bold when it comes to software product design, but unfortunately some companies hesitate to make changes when they’ve already experienced some level of success. Companies can be lulled into complacency, causing them to fall behind the rest of their respective markets.
Undertaking a significant UI update comes with legitimate concerns, but as technology rapidly evolves and changes, the likelihood of product stagnation increases, and its impact becomes potentially more damaging. You may need to inconvenience your user base in the short-term to bring a big payoff down the road.
Even an enterprise giant like Apple takes risks and changes its product in anticipation of future opportunity. After surveying app developers, the company realized that alienating this group would drive revenue to competing platforms and potentially harm the App Store’s future. Despite 40% revenue growth in 2016, it decided to build new analytics tools and update the store’s interface to allow developers to respond directly and publicly to customer reviews.
Remember: No Solution is Perfect
Even the most cutting-edge, revolutionary software developments are met with complaints, so expect them any time a UI is updated or changed. Users are rarely satisfied with changes right away, so remain level-headed when responding, and keep in mind that concerns don’t always indicate a widespread problem.
Few innovations are ideal for an entire user base, so decisions should be made based on evidence and research that identifies critical tasks and the most important design elements. Randomly surveying a target audience not only helps determine the validity of complaints, but provides insight into whether that group truly represents a product’s primary user base.
Before releasing any new UI feature, roll out the improved product to a small user group without notifying them of the change to seek honest impressions and reactions. After further time has passed, contact the users again to gain additional feedback and accurately gauge the success or failure of any updates.
Ultimately, no two UI design projects are created or implemented equally. Careful product and user base research are key to successfully updating or changing software. Though it can be an arduous process, the potential payoff for an organization is huge. Even industry-leading platforms can use the occasional new look.
This is the sixth in Blue Hill Research’s blog series “Questioning Authority with Toph Whitmore.”
At a time when many of us seek ways to apply our skills in the service of the greater social good, some people are actually doing it. In late 2016, Jonathon Morgan, CEO of Austin, Texas-based startup New Knowledge, created Data for Democracy, a loosely-formed coalition of data experts, analysts, engineers, project managers, scientists, and more. The organization tasks itself with the rather noble aim of solving real-world problems. Data for Democracy (or “D4D,” as the kids call it) crowd-sources its attacks on challenges like big-city traffic optimization, international refugee migration forecasting, and more. A recent contest collaboration with KDNuggets tasked entrants to devise algorithms to detect “fake news.” I spoke with Jonathon about D4D’s mission, community, and opportunity.
TOPH WHITMORE: Tell me about Data for Democracy. Who’s involved, and what kind of work are you doing?
JONATHON MORGAN: Data for Democracy is a volunteer collective. We’re about a thousand people right now, including data scientists, software engineers, and other technologists. There’s about a dozen active projects—some are community-led. For instance, there’s a group of folks who’ve been collecting election results data dating back to 1988 at the county level. That involves calling secretaries of state in different states around the country, collecting that data however they produce it, and going through the often-manual process of cleaning up that data and packaging it in a way that other people can use.
There are also projects where we partner with an existing organization. We’re working on a data model with the city of Boston so that we can ultimately produce an application that Boston citizens can engage with to experiment with how traffic fatalities can be reduced across the city. We’re also working with the Internal Displacement Monitoring Center (IDMC) on a project to understand the flow of refugees internally within a country based on conflict or a natural disaster. It’s a wide range of projects, which is important with a group this size. But almost everything is community-driven, community-led. Everybody’s a volunteer. We’ve been active for about three months.
TW: So Data for Democracy is composed of volunteers—What’s your mission or charter? What brings these volunteers together?
JM: The mission is broad. We are a community using data and technology to work on meaningful social impact projects, full stop. The genesis of it, there seemed to be a sense in the technology community–and in particular the data science community—that had been growing for some time, that there was a need for that community to understand and discover its civic responsibilities.
Perhaps because this latest election was fairly polarizing, I think people on both sides of the aisle want to be more engaged: they want to be participating and organizing, building community, participating in the democratic process, making sure that their voice is heard in the discussion. That typically hasn’t been a role the technology community has played. It’s a moment in which people have a lot of passion and excitement and enthusiasm for this type of engagement, so we wanted to make a space where people could gather, organize, and meet others who were feeling the same sense of responsibility, and find worthwhile projects to dedicate their time and energy to.
TW: Do you serve a political aim? Or is Data for Democracy non-political?
JM: We don’t serve a political aim. There’s people in the D4D community from both sides of the political spectrum. We have volunteers who consider themselves Tea-Party Republicans collaborating with people who worked with Hillary for America. The thing that holds everybody together is a belief in the power of technology and data to have a positive impact on the way that our cities and ultimately our states and country are run. That’s a pretty powerful thing.
TW: Are your volunteers primarily Americans working on American projects? Or is it more international than that?
JM: We’re fairly international, though everybody’s operating in English. Our volunteers skew toward the U.S., Canada, U.K., Australia but there’s also more Europeans interested in working on projects that have more of an international focus. I mentioned the large group that’s working on understanding the flow of refugees inside of countries: It’s a fairly specific humanitarian objective, and the volunteers are partnering with an organization called VINVC which focuses on this kind of internal migration. It’s probably 80/20, with 80% of the community in the U.S., but even on U.S.-specific projects like the one with the city of Boston, the intention is to take the model and process and adapt them to the transportation and mobility data available around the country, and ultimately around the world.
TW: What skills do the volunteers bring to these projects?
JM: A wide variety, under the larger umbrella of data science and technology. There’s folks with data-engineering backgrounds, machine-learning, statistics, software engineering, infrastructure operations. There are people that make the plumbing of all of our software and data applications work, there are communications folks who focus on the story-telling, people who focus on data visualizations, even a few folks who are more product and project managers—They tend to be good organizers for the projects and for the general community.
With a community this large, we have to think deliberately about the mechanics of the community: how you join, how to hook you up with the right projects, how to make sure you don’t feel lost. With projects like this, it’s a little bit like wandering into a big city and trying to figure out where to stay for the night. It can be a little bit daunting if nobody is there to grab your hand. It’s somewhere in-between an open-source software project and an academic research project, like those two worlds coming together.
TW: You beat me to the open-source community analogy. Walk me through the project management model. How do the projects get determined? Who leads them?
JM: The projects come from two places. First, someone in the community will have an idea—something that would be interesting to work on. We have a space in the community for those sorts of conversations, and if a handful of people are also excited about that idea, then they run off and do it.
Second, somebody from outside the community might have an idea, hear that we exist and then approach us for some collaborators and executing on the idea. Our work with the Internal Displacement Monitoring Center is a good example: The IDMC has obviously deep expertise understanding immigration law, but its members are not technologists and data scientist, nevertheless they have important data needs that we can help with.
So far, every project has started with a small core of people—one, two, or three—who have expressed passion for delivering it, and have time and energy to devote to it. I tap them on the shoulder and say “Hey it looks like you’re excited about this, how about you assume the responsibility for leading it, organizing it, setting deliverables, making sure that people understand what this project is about and how to get involved with it.”
So far, that model is working. There aren’t a lot of good working models for collaborative research, like there is for collaborative software development, for example. The people who end up being project leads are so essential to this process: They document objectives, needed skills, the sorts of people who can add value, and the specific, bite-size tasks they can engage in to give something back to the project.
TW: How many projects are you working on? What does delivery look like?
JM: Right now, there’s about a dozen active projects with multiple delivery points. In a sense, there’s no such thing as done. In the election transparency project, the first deliverable was to document county-level elections results back to 1988 for all of the counties in the United States. That was a big marker. Once the volunteers produced that data set, they published it via a partner platform, Data.World. They made that data available to the public. That’s a big deliverable, but it’s just step one.
Next there’s the modeling process to understand what economic or socioeconomic factors might have caused certain counties to flip in any given election year, and what the underling mechanics of that might be. That requires a lot of statistics. The team is close to having models that explain at least some of that phenomenon. The deliverable after that will be reports or in our case, blog posts, where we communicate findings and implications of those findings. Along the way, we’re generating artifacts that can be used by other data scientists and software engineers. Everything we work on we publish as open-source projects.
TW: How is Data for Democracy funded?
JM: We don’t have corporate sponsors. A handful of technology providers have offered their products to the community to use for free. Data.World is a data publishing and collaboration platform—many of their staff are community members and have supported projects in addition to offering use of their platform. Eventador is a streaming data platform that’s been helpful in data acquisition and processing. Mode Analytics is an analytics and dashboard platform that we’ve been using data exploration and visualization. And Domino Data Lab is a collaborative research platform which we’ve been utilizing as well.
TW: How can someone reading this get involved with Data for Democracy?
JM: For an individual, just let us know. We have a couple steps to get you into the community, understand where you might want to contribute, where your skills might sync up with active projects.
For an organization, it’s the same process, but we’ll talk about how Data for Democracy can be useful to the organization. The city of Boston had a very clear idea—we’re working with their data and analytics team, so they had a specific project idea that was appropriate for data science and technology. Then we can frame a project and offer it to the community to see who’s interested in working on it.
TW: Any other projects to highlight?
JM: We’re sponsoring a data visualization project with KDNuggets. The goal is to debunk a false statement using data visualizations as a story-telling tool. It’s a nice way to counter the rhetoric we heard over the course of the last election. People say we’re in this post-factual environment—As data scientist, we have a real responsibility to right that ship. It’s an interesting idea for a contest in trying to get people to think about how they can clearly communicate a fact so that it’s interpretable and that it makes sense and that it’s sticky.
TW: Data for Democracy just hit a thousand volunteers. How important is that milestone?
JM: It signals that this is an important movement for the technology community. This isn’t just a response to the election, this is something that the community needs. This sense of civic engagement and responsibility is a real thing. This is a foundational shift in the way technologists see themselves.
TW: Where do you go from here? What comes next?
JM: There’s always more work to be done. It means making sure that we’re collaborating with partners that can use this kind of help in furthering their mission. When we have the data sets that we’re creating and the models that we’re producing, we’re making sure that we communicate that to the outside world in the broader community…that we’re participating in the national discussion about the kind of discourse that we want our country to have. It means continuing to improve our community so it’s easy for people to get involved, there’s always something for them to do, and that we’re making it a place that’s welcoming and positive and accepting and full of energy, which is what it is right now.