On Hadooponomics this week, we have Jerry Overton, Data Scientist and Distinguished Engineer at Computer Sciences Corporation (CSC), and a teacher, author, and thought leader in using Big Data to influence change in an organization. This week we are action-oriented, looking at the concrete steps Big Data practitioners can take to be more effective at their jobs. We dive into high-impact data science, and the difference between Big Data in theory and Big Data in practice – both how to be personally effective, and how to make an impact at enterprise scale. This episode is particularly relevant to data practitioners, but has insights that resonate across the spectrum of Big Data involvement.
We begin our conversation by discussing the problem of “stack thinking.” For data practitioners, stack thinking is an isolated view focused on a specific technology layer or component, rather than on how the solution as a whole can create stakeholder value. Jerry has some advice on how to combat this (hint: focus on outcomes, then choose the tools).
Next we dive into the art of the hack, or how Big Data practitioners can use resources already at their disposal, such as existing code, to evolve solutions rather than build them from scratch. Jerry advises that many Big Data problems can be solved by working with a zero draft that includes pre-existing, or previously written code, and molding the code as needed.
Finally, we confront the echo chamber in Big Data: what creates real value, and what’s just hype. First off: the scientific method. How do we ensure the questions we are asking are valuable, even before the hypothesis stage? Jerry explores why the scientific method might not be the best approach, and how we can use a new way of thinking to better solve Big Data problems.
Listen to the Show:
The Art of the Hack by Logan Wilt
Mastering Data Science at Enterprise Scale via O’Reilly Media
Doing Data Science CSC Blog
Find Jerry on Twitter @JerryAOverton
About Arcadia Data:
Arcadia Data unifies visual analysis, business intelligence and data discovery; it runs natively on your Hadoop clusters without data extracts. Its easy-to-use browser-based visualizations deliver secure access for hundreds of concurrent users across hundreds of billions of rows in near-real time.