Why Your Data Preparation and Blending Efforts Need a Helping Hand

Data Blender Blog PictureIn past blog posts, we talked about how data management is fundamentally changing. It’s no secret that a convergence of factors – from an explosion in data sources, innovation in analytics techniques, and a shifting decentralization of analytics away from IT – all create obstacles as businesses try to invest in the best way to get value from their data.

Individual business analysts are encountering a growing challenge as the difficulty of preparing data for analysis is expanding almost as exponentially as the data itself. Data exchange formats such as JSON and XML are becoming more popular, and present a difficult task to parse and make useful. Combined with the vast amounts of unstructured data held in Big Data environments such as Hadoop and the growing number of ‘non-traditional’ data sources like social streams or machine sensors, getting data sources into a clean format can be a momentous task.

Analyzing social media data and its impact on sales sounds great in theory, but logistically, it’s complicated. Combining data feeds from disparate sources is easier now than ever, but it doesn’t ensure that the data is ready for analysis. For instance, if time periods are measured differently in the two data sources, one set of data must be transformed so that an apples-to-apples comparison can be made. Other predicaments arise if the data set is incomplete. For example, sales data might be missing the zip code associated with a sale in 20% of the data set. This, too, takes time to clean and prepare.

This is a constant challenge, and one that is exacerbated at scale. Cleaning inconsistencies in a 500-row spreadsheet is one thing, but doing so across millions of rows of transaction logs is quite another.

A certain level of automation is required to augment the capabilities of the analyst when we are dealing with data at this scale. There is a need for software that can identify the breakpoints, easily parse complex inputs, and pick out missing or partial data (such as zip codes) and automatically fill it in with the right information. Ultimately, the market is screaming for solutions that let analysts spend less time preparing data and more time actually analyzing it.

For all of these reasons, it is no surprise that a number of vendors have come to market offering a better way to prepare data for analysis. Established players like MicroStrategy and Qlik are introducing data preparation capabilities into their products to ease the pain and allow users to stay in one interface rather than toggle between tools. Others, like IBM Watson Analytics and Microsoft Power BI, are following a similar path.

In addition, a number of standalone products are ramping up their market presence. Each offers deeply specialized solutions, and should provide a much-needed helping hand to augment data analysts’ effort.  At Blue Hill, we have identified Alteryx, Informatica Rev, Paxata, Tamr, and Trifacta as our five key standalone solutions to evaluate. (For a deeper analysis of each solution and a further look at market forces in general, be on the lookout for our upcoming research report on the subject.) These products represent a new breed of solutions that emphasize code-free environments for visually building data blending workflows. Further, the majority of these solutions leverage machine learning, textual analysis, and pattern recognition to automatically do the brunt of the dirty work.

As a forward-looking indicator to the promise of the space, venture capital firms have notably placed their bets. Most recently, Tamr announced $25.2 million in funding this week, and Alteryx landed $60 million in funding late last year. This is a validation of what data analysts already know: the need for scalable and automated data blending and preparation capabilities is gigantic.

About James Haight

James Haight is a principal analyst at Blue Hill Research focusing on analytics and emerging enterprise technologies. His primary research includes exploring the business case development and solution assessment for data warehousing, data integration, advanced analytics and business intelligence applications. He also hosts Blue Hill's Emerging Tech Roundup Podcast, which features interviews with industry leaders and CEOs on the forefront of a variety of emerging technologies. Prior to Blue Hill Research, James worked in Radford Consulting's Executive and Board of Director Compensation practice, specializing in the high tech and life sciences industries. Currently he serves on the strategic advisory board of the Bentley Microfinance Group, a 501(c)(3) non-profit organization dedicated to community development through funding and consulting entrepreneurs in the Greater Boston area.
Posted on June 22, 2015 by James Haight

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Latest Blog

NEWS: AOTMP Acquires Blue Hill Research Very Enterprising Tech Trends for 2018 - Mobile & More! Managing Today’s Mobile Projects - Part 3: Successful Deployments – Setting Goals & Measuring Results

Topics of Interest

Advanced Analytics







Big Data


Business Intelligence


Cognitive Computing

Corporate Payments

Data Management

Data Preparation

Data Wrangling





design thinking


Emerging Tech

enterprise applications

Enterprise Performance Management

enterprise video

fog computing

General Industry



Hadoop World

Human Resources


IBM Interconnect




Information Builders


Internet of Things



legacy IT


Legal Tech

Log Data

Machine Learning

Managed Mobility Services


Mobile Managed Services







Predictive Analytics

Private Equity



Questioning Authority

Recurring Revenue

Risk Management


Sales Enablement



service desk

Social Media



Supply Chain Finance

Switchboard Software




Telecom Expense Management




Unified Communications


USER Applications

User Experience

User Interface

video platform



Wearable Tech