Why We Need to Redefine Data Science

This influential field has evolved exponentially — so have the roles and teams that power it. It’s time for companies to catch up.

Data Science Board Portrait

By the Data Science Standards Board

A decade ago the term “data scientist” didn’t exist. Back then, the equivalent roles were statisticians or software engineers; people who analyzed data or built software to solve problems.

But, with the rise of the data economy, the field emerged. With organizations keen to capitalize on the newly available data for the first time, demand for data scientists quickly outstripped supply. According to LinkedIn’s 2017 Emerging Jobs Report, data scientist roles have grown over 650% since 2012, the year the Harvard Business Review deemed it the “sexiest job of the 21st century.”

And yet today, despite data scientist being ranked by LinkedIn as the second fastest-growing job in the U.S. (directly behind machine learning engineer, another data role…), it is still unclear to many people what data science is, or what a data scientist really does. The term is used to describe a wide range of roles, and could mean anything from a machine learning expert to a spreadsheet pro. A standard definition does not exist.

Why does this matter?

For data leaders, a lack of clarity around the term “data science” is the root cause of significant problems, including:

  • No clear sense of the skills required to satisfy an organization’s data science needs. This results in vague job descriptions that fail to attract the right candidates, and excessive time spent evaluating applicants. It can also lead to new hires having the wrong expectations for their roles, and company-wide inefficiencies due to the wrong talent being in the wrong place.
  • A lack of clear paths for team members to develop or advance their careers, leading to dissatisfaction and high churn. A survey by the data competition platform Kaggle revealed that most people working in data science spend one to two hours a week looking for a new job. This further aggravates the shortage of skilled data professionals, with IBM predicting demand for data scientists and advanced analysts to grow by a further 28% by 2020.
  • Business leaders not knowing which data team members do what, or worse still, not knowing what their data science teams do. This results in internal inefficiencies as well as company-wide underperformance; if leaders don’t know what’s possible then the opportunity for innovation is limited.

Each of these problems relate to a lack of understanding around what data science is in the context of an organization, and the different roles required to enable it. The problems are expensive given the high cost of the average data professional hire — Glassdoor reports the average annual salary for a data scientist as $120,000 — not to mention the opportunity cost of a data team that is poorly deployed.

As data leaders at major companies across the entertainment, finance, health, technology, and information industries, we care about these issues and are affected by them every day. As such, we formed the Data Science Standards Board with General Assembly to clarify the field of data science and agree a common standard of excellence for the industry. After working over the past six months, we are now sharing our work publicly for the first time.

Let’s Start With a Definition of Data Science

To begin, we need a clear definition of what data science actually is.

Data science is the practice of:

acquiring, organizing, and delivering complex data;

discovering relationships and anomalies among variables;

building and deploying machine learning models;

and synthesizing data to influence decision making.

Data science is made up of four distinct disciplines that involve the competencies above: data engineering, quantitative research, machine learning, and advanced analytics.

Could one person bear all of these responsibilities? Sure, unicorns do exist. But unicorns responsible for all of “data science” as a whole will be on the fast track to burnout, exacerbating the high churn rates referred to in our list of problems.

Hiring for a “data scientist” when you need a quantitative researcher or machine learning specialist is akin to hiring for a “doctor” when you could mean a neurologist, pediatrician, or anaesthetist. As the breadth and depth of the data science field has matured, roles have become more specialized and sophisticated.

The Data Science Career Framework

Instead of using the catchall title of “data scientist,” data professionals — and the people who hire and work with them — would benefit from using more specific job titles based on the four emerging specializations for mid-career data professionals.

We’ve combined our thinking around mid-career data roles with two other simple observations:

  1. Teams have entry-level roles that require a common baseline of foundational skills.
  2. Teams have leadership roles that require general management skills.

Guided by these findings, we’ve created a comprehensive career framework that delineates the possible growth paths of an individual working in data science. It defines the path an individual takes from an entry-level stage, though a mid-career specialization, into a leadership role.

Data Science Career Framework

Let’s break down each section of the framework.

Level 1: Foundation (Entry-Level)

To begin a career in data science, individuals need the bundle of skills in Level 1: wrangling, exploring, modeling, and communicating. With these abilities, professionals can execute on well-scoped tasks while relying on the guidance of experienced teammates. Level 1 skills serve as common building blocks and foundation for each of the specializations in Level 2, and they’re essential irrespective of company domain and size.

Level 2: Application (Mid-Career)

Experienced data professionals typically specialize in one of the four Level 2 domains, each of which require a different focus, whether it be organizing complex data or predicting relationships among variables. Thus, instead of calling everyone within Level 2 data scientists, we instead call them data engineers, quantitative researchers, machine learning engineers, and advanced data analysts, to reflect the unique nuances of each concentration.

Level 3: Leadership (Senior Role/Management)

For team members who seek leadership roles, Level 3 contains the bundle of additional skills — in business, governance, and people — needed to be a successful data team leader. Because these roles require generalization and problem-solving across the stack, successful Level 3s have often covered more than one Level 2 specialization during their careers.

Clarity Comes From Action: Next Steps

As a board, we are committed to tangible action. As such, we are going to use this framework within our organizations in four key ways, and we encourage other data leaders to do the same.

  1. Define career paths. The framework can be used as a common starting point to guide individuals’ career progression and communicate job paths between managers and employees. It will help team members understand how to develop and advance their careers, which will improve job satisfaction and reduce churn.
  2. Establish team structure. Explaining the ideal structure of a data team will help business stakeholders understand the team and the roles that each individual plays. This ensures stakeholders are aware of the possibilities for innovation available to them.
  3. Hire with specificity. We’ll hire using specific job descriptions that avoid ambiguous language and clearly explain what we are looking for in our open positions. To get started and to ensure we hire the right people for each role, we’ve written job description templates that we will use as the starting point for defining what each role looks like in our organization. These templates — which you can download here — include details on skills to hire for.
  4. Create assessments. We are using the career framework to create assessments for these skills in collaboration with the team at General Assembly. Employers struggle to find data professionals in a timely manner, and job-seekers are often interviewed by hiring managers with no real idea of what they want the data role to include. These assessments will enable job-seekers to show off their skills in a meaningful way across employers, and reduce the friction felt by both sides in the hiring process.

Our industry needs as many of us as possible to use a common language around data science. We’ve created a package with all the collateral you’ll need — from a copy of the framework to job description templates — that you can download here.

In parallel, we are seeking feedback from our colleagues to refine this framework. We’re starting with our partners who work in human resources at our companies, and engaging with industry associations and peers leading data science teams around the world. We’re also asking you. If you’ve got reactions, feedback, or advice on how this framework could be useful for you, please reach out to us at credentials@ga.co.

By redefining data science, we can begin to solve some of the biggest challenges facing the profession. We look forward to working together to make this a reality.

Meet the Authors: 

Data Science Board