Is Tomorrow’s Data Scientist a Generalist or a Specialist?

Data is the new oil but like the oil industry, processes and technology are needed to monetize it. Digital tech leaders such as Amazon, Facebook, and Google have attained the holy grail by being able to generate billions of dollars in revenue from this new “oil”. We are all consumers of targeted ads on our devices that use our browsing history in attempting to offer the right consumer with the right product at the right time. This sounds familiar as this has been the philosophy of CRM marketers for the last 25 years. The digital ecosystem has just accelerated this notion with customer experience being the new phrase that now encapsulates this recent thinking.

But underneath these increasing capabilities for these organizations lie the data scientist who is responsible for turning raw data into business solutions. Hundreds of data scientists have been hired by these leading-edge tech organizations where the objective is to build that next great algorithm or service but with data as its underpinning. For these organizations, the problems to solve are very technical in nature and require that high level of technical acumen which is attained at such institutions such as MIT or Stanford and University of Waterloo in Canada. 

Yet, this is not necessarily the case with most other organizations as they are looking for pragmatic business solutions that are leveraging the data within their environment. In fact, this was indeed the scenario in the early days of data science where many of these pioneering organizations such as the banks and telcos developed in-house capabilities of data science in order to pragmatically solve business problems. In those early days, the data scientists needed to have a high degree of technical acumen at least from a programming or coding standpoint. This was not about writing code to build mathematical algorithms but rather to write code in order to generate meaningful data from the raw data. In other words, the early practitioners of data science had to be able to program or code in order to generate a meaningful analytical environment from raw data.  As mentioned in the previous sentence, no coding was required to create mathematical algorithms since these algorithms were developed as modules and procedures by data scientists armed with PHD levels of knowledge in mathematics. With the development of these procedures and modules, the non-PhD data scientist could simply integrate the right mathematical module or procedure in building the necessary solution. However, this still required the data scientist to have a deep technical understanding of the module or algorithm’s output. 

Besides having the requisite technical both in programming as well as mathematics, the successful data scientist practitioner needed to understand the business itself and its challenges. Success as a data scientist in these early days required the practitioner with the requisite technical skills to probe deeply with the relevant business stakeholders to better understand their expectations and challenges. Furthermore, he or she would also need to determine how to deliver relevant output that could be effectively communicated to the business. But for many data scientists in the early days, this was often outside their comfort zone as the “technical” component was the real motivator. Typically, these “techie” data scientists would be totally reactive in receiving the commands from the business stakeholder on what to do with the data. Although this scenario might be satisfactory for many business problems, it was sub-optimal because the business stakeholder does not have the data science knowledge that could be creatively used to produce a better solution.  It was rare indeed to find that data science hybrid or generalist who was motivated by both technical as well as business acumen. But business skills and technical skills were the real keys to success in those early days and it is exactly this complementation of skills which will even become increasingly of more value in today’s more automated environment of Big Data and AI.

The data science and analytics in today’s environment with its dizzying array of tools empower more people to conduct the actual steps and processes that are required in a data science exercise. Some of these tools mitigate the level of programming in “working” the data to create a meaningful analytical file. The practitioner, though, still does need to still understand the “data science” process which are the steps and processes that are required to create this analytical file. Most software today, including open source, also offers easy access to advanced analytics routines including the use of deep learning techniques or artificial intelligence. But the practitioner does not have to code all the arcane math in behind these routines but needs to understand the output and equally important those parameters that can be adjusted which will of course change this output. Putting on their business hat, the practitioner needs to determine how this output will impact the business given the business problem we are trying to solve.    

With this type of tools, less time is now spent on coding and programming to produce output. This time saving can now be used to solve more business problems. In fact, the plethora of these new tools in the marketplace creates the capacity to identify new business challenges and problems. As businesses can now attempt to address more business problems, the growing demand is the need for that generalist not unlike the “early” successful data scientist of the 1990’s. Gartner’s latest reports all reinforce the high demand for these “generalist” type of data scientists, often referred to as citizen data scientists. I still refer to them as data scientists as like other professions, they will need to be trained in the various subjects that are necessary to be considered competent within this discipline. 

For the data scientist of the future, it is about a range of thinking which the ability is to apply our specialized data and mathematical skills in creatively solving business problems. There is a great book entitled Range: Why generalists triumph in a specialized world by David Epstein which speaks to this lateral type or range of thinking and the premise of how this will be the real need in an increasingly automated world.  The book refers to this type of thinking which applies to all disciplines including data science. In a world becoming ever more data-driven, this so-called range of thinking exhibited by these data science generalists will be the siren call for virtually all organizations.

Richard Boire, President at Boire Analytics

Tell Us What You Think
  1. If you haven't left a comment here before, you may need to be approved by CMA before your comment will appear. Until then, it won't appear on the entry.
    Thanks for waiting. View CMA's Blogging Policy.