Designer Data Science

I am pleased to report Big Data is here to stay and we are now moving into the application age with many moving beyond descriptive (BI) to prescriptive or machine learning focus. After attending STRATA NYC last month and Databeat this past week I am seeing first hand how this trend is rapidly evolving.  First, lets take an example of another major technological shift that happened a little over 10 years ago when the internet and web applications came of age.

At first there were only a handful of way to access the web via “internet portals.” Many people could access the open web to truly leverage its amazing potential to communicate, access information quickly, and create content. Next, we saw the boom create a huge demand for web developers with little emphasis on design.  I remember fondly many of my engineering colleagues jumping into the fray learning php, html, tcp/ip and other web programming languages to take advantage of the demand.  It wasn’t until the bubble bust and the next era of Web 2.0 arrived that frameworks became standard and the focus shifted to design.  These days do people call themselves Web Developers? Not really, I’d say you see more Web Designers attracting the high salary that can use established web frameworks to design the best customer experience.

blue print

It often reminds me of the situation of the Data Scientist today where many believe the best are great programmers who can leverage R, Python and MapReduce to create one off analysis. Scott Yara from Pivotal went so far as to say this last week, “It only takes minutes for a Programmer to become a Data Scientist.”  Do we truly believe that? When we heard from Allen Day, Data Scientist at MapR, he did not talk in terms of data frames or Hadoop jobs.  Instead, he focused on the design component of engineering a big data application.  No question he has a strong ability to program and work with big data technology, but what truly sets him apart is his ability to design solutions.  You can hear more snipets from his talk “What Shape is Your Data,” by liking us on Facebook.

Today the majority of Data Science applications are heavily coding and scripting frameworks (Python, R, Scala, Java, and Map Reduce).  However, at Alpine we are thinking differently about how to design and replicate analysis without having to start from scratch each time.  We go further and abstract the code into representations of operations to make it less programming intensive.  I agree with Trifacta’s CEO Joe Hellerstein, when he states “Let’s take the programming requirement out of Data Science.”

Leave a Reply