Why Your Data Preparation and Blending Efforts Need a Helping Hand

Data Blender Blog PictureIn past blog posts, we talked about how data management is fundamentally changing. It’s no secret that a convergence of factors – from an explosion in data sources, innovation in analytics techniques, and a shifting decentralization of analytics away from IT – all create obstacles as businesses try to invest in the best way to get value from their data.

Individual business analysts are encountering a growing challenge as the difficulty of preparing data for analysis is expanding almost as exponentially as the data itself. Data exchange formats such as JSON and XML are becoming more popular, and present a difficult task to parse and make useful. Combined with the vast amounts of unstructured data held in Big Data environments such as Hadoop and the growing number of ‘non-traditional’ data sources like social streams or machine sensors, getting data sources into a clean format can be a momentous task.

Analyzing social media data and its impact on sales sounds great in theory, but logistically, it’s complicated. Combining data feeds from disparate sources is easier now than ever, but it doesn’t ensure that the data is ready for analysis. For instance, if time periods are measured differently in the two data sources, one set of data must be transformed so that an apples-to-apples comparison can be made. Other predicaments arise if the data set is incomplete. For example, sales data might be missing the zip code associated with a sale in 20% of the data set. This, too, takes time to clean and prepare.

This is a constant challenge, and one that is exacerbated at scale. Cleaning inconsistencies in a 500-row spreadsheet is one thing, but doing so across millions of rows of transaction logs is quite another.

A certain level of automation is required to augment the capabilities of the analyst when we are dealing with data at this scale. There is a need for software that can identify the breakpoints, easily parse complex inputs, and pick out missing or partial data (such as zip codes) and automatically fill it in with the right information. Ultimately, the market is screaming for solutions that let analysts spend less time preparing data and more time actually analyzing it.

For all of these reasons, it is no surprise that a number of vendors have come to market offering a better way to prepare data for analysis. Established players like MicroStrategy and Qlik are introducing data preparation capabilities into their products to ease the pain and allow users to stay in one interface rather than toggle between tools. Others, like IBM Watson Analytics and Microsoft Power BI, are following a similar path.

In addition, a number of standalone products are ramping up their market presence. Each offers deeply specialized solutions, and should provide a much-needed helping hand to augment data analysts’ effort.  At Blue Hill, we have identified Alteryx, Informatica Rev, Paxata, Tamr, and Trifacta as our five key standalone solutions to evaluate. (For a deeper analysis of each solution and a further look at market forces in general, be on the lookout for our upcoming research report on the subject.) These products represent a new breed of solutions that emphasize code-free environments for visually building data blending workflows. Further, the majority of these solutions leverage machine learning, textual analysis, and pattern recognition to automatically do the brunt of the dirty work.

As a forward-looking indicator to the promise of the space, venture capital firms have notably placed their bets. Most recently, Tamr announced $25.2 million in funding this week, and Alteryx landed $60 million in funding late last year. This is a validation of what data analysts already know: the need for scalable and automated data blending and preparation capabilities is gigantic.

About James Haight

James Haight is a principal analyst at Blue Hill Research focusing on analytics and emerging enterprise technologies. His primary research includes exploring the business case development and solution assessment for data warehousing, data integration, advanced analytics and business intelligence applications. He also hosts Blue Hill's Emerging Tech Roundup Podcast, which features interviews with industry leaders and CEOs on the forefront of a variety of emerging technologies. Prior to Blue Hill Research, James worked in Radford Consulting's Executive and Board of Director Compensation practice, specializing in the high tech and life sciences industries. Currently he serves on the strategic advisory board of the Bentley Microfinance Group, a 501(c)(3) non-profit organization dedicated to community development through funding and consulting entrepreneurs in the Greater Boston area.
Posted on June 22, 2015 by James Haight

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Latest Blog

NEWS: AOTMP Acquires Blue Hill Research Very Enterprising Tech Trends for 2018 - Mobile & More! Managing Today’s Mobile Projects - Part 3: Successful Deployments – Setting Goals & Measuring Results

Topics of Interest

Advanced Analytics

AI

Analytics

Anodot

Attunity

authentication

BI

Big Data

Blog

Business Intelligence

Cloud

Cognitive Computing

Corporate Payments

Data Management

Data Preparation

Data Wrangling

DataKitchen

DataOps

DataRobot

design

design thinking

Domo

Emerging Tech

enterprise applications

Enterprise Performance Management

enterprise video

fog computing

General Industry

GoodData

GRC

Hadoop World

Human Resources

IBM

IBM Interconnect

Iguazio

ILTACON

Informatica

Information Builders

innovation

Internet of Things

IoT

knowledge

legacy IT

Legal

Legal Tech

Log Data

Machine Learning

Managed Mobility Services

Microsoft

Mobile Managed Services

Mobility

Nexla

Order-to-Cash

passwords

Pentaho

Podcast

Predictive Analytics

Private Equity

Procure-to-Pay

Qubole

Questioning Authority

Recurring Revenue

Risk Management

ROI

Sales Enablement

Salesforce

Security

service desk

Social Media

Strata

Striim

Supply Chain Finance

Switchboard Software

Tableau

Talend

Tangoe

Telecom Expense Management

Time-to-Value

Trifacta

TWIDO

Unified Communications

usability

USER Applications

User Experience

User Interface

video platform

Virtualization

Visualization

Wearable Tech

Yellowfin