In this episode, Gauthier Vasseur and I got a bit more technical and talked about data science. When I asked Gauthier to define it, he chuckled at first and we agreed that there was no ‘quick definition’. However, he continued to say that when data got too complex, too bulky and too diverse and it was necessary to apply scientific methods to determine what the data was conveying, then we are entering into the realm of data science.
We went through some of the key methodologies that encompass data science how it is employed and in what circumstances it is most effective. We noted that data science has been enabled as the perfect storm of processing power, speed of storage, bandwidth and data generation converged.
Data quality is a key component of ensuring that any analysis that gets done is actually relevant.…This quality component is often overlooked at the peril of the results and output. Further, data preparation and cleaning accounts for about 80% of the work…so figuring out how to scale the data preparation is quite important.
As the Executive Director of the Fisher Center for Business Analytics at the Haas School of Business at Berkeley, Gauthier knows that there is no data science without a clear business question. One can not simply go out and analyze massive data sets and just try to find something. It just does not work this way…If you do not know what you are trying to find, you cannot even know what kind of data you need. The entire process breaks down.
Listen to the entire episode as we give interesting examples and discuss the topic in depth.