March 5, 2014
by Anand Rao
What is “data science”? Is it really a new emerging discipline as some claim it to be; or is it the emperor in new clothes – data mining, statistics, business intelligence or analytics re-branded? Moreover, is it possible that one person can fulfil the role of a data scientist? Rather than answering this question directly, let’s review some of the skills required for someone to be a “data scientist.”
First and foremost, a “data scientist” is a business or domain expert: Someone who has to have the ability to articulate how information, insights, and analytics can help business leadership answer key questions – and even determine which questions need answering – and make appropriate decisions. The data scientist will need a thorough understanding of the business across the value chain (from marketing, sales, distribution, operations, pricing, products, finance, risk, etc.) to do this well.
Second, a “data scientist” is a statistics expert: Someone who has to have the ability to determine the most appropriate statistical techniques for addressing different classes of problems, apply the relevant techniques, and translate the results and generate insights in such a way that the businesses can understand the value. This will be predicated on a thorough understanding of statistical (e.g., regression analysis, cluster analysis, and optimization techniques) techniques and the tools and languages used to run the analysis (e.g., SAS or R)
Third, a “data scientist” is a programming expert: Someone who has the ability to determine the appropriate software packages or modules to run, the ability to modify them, and the ability to design and develop new computational techniques to solve business problems (e.g., machine learning, natural language processing, graph/social network analysis, neural nets, and simulation modelling). Invariably, the data scientist would have a computer science background and be comfortable designing and programming in a variety of languages including Java, Python, C++ or C#.
Fourth, a “data scientist” is a database technology expert: Someone who has a thorough understanding of external and internal data sources, how they are gathered, stored, and retrieved. This will enable the data scientist – and by extension, the business as a whole – to 1) extract, transform and load data stores; 2) retrieve data from external sources (through screen scraping and data transfer protocols); 3) use and manipulate large ‘big data’ data stores (like Hadoop, Hive, Mahoot and an entire range of emerging Big Data technologies); and 4) use the disparate data sources to analyze the data and generate insights.
Finally, a “data scientist” is a visualization and communications expert: Someone who has a thorough understanding of visual art and design. This is important because it enables those who aren’t professional data analysts to interpret data. Accordingly, the data scientist should be able to 1) take statistical and computational analysis and turn it into graphs, charts, and animations; 2) create visualizations (e.g., motion charts, word maps) that clearly show insights from data and corresponding analytics; and 3) generate static and dynamic visualizations in a variety of visual media (e.g., reports, screens – from mobile screens to laptop/desktop screens to HD large visualization walls, interactive programs, and – perhaps soon – augmented reality glasses). Last, but not least, a ‘data scientist’ should be able to engage with senior management, talk their language and translate the data-driven insights into decisions and actions.
Do any of the alternative phrases, such as “data mining”, “business intelligence”, “analytics”, “statistician” capture all of the five expertise areas? Do you have any “data scientists” who fit the description above in your organization? If not, where and how can you find them?