The Skills and Tools Every Data Scientist Must Master

By

women of color in tech

Photo by WOC in Tech.

“Data scientist” is one of today’s hottest jobs.

In fact, Glassdoor calls it the best job of 2017, with a median base salary of $110,000. This fact shouldn’t be big news. In 2011, McKinsey predicted there would be a shortage of 1.5 million managers and analysts “with the know-how to use the analysis of big data to make effective decisions.” Today, there are more than 38,000 data scientist positions listed on Glassdoor.com.

It makes perfect sense that this job is both new and popular, since every move you make online is actively creating data somewhere for something. Someone has to make sense of that data and discover trends in the data to see if the data is useful. That is the job of the data scientist. But how does the data scientist go about the job? Here are the three skills and three tools that every data scientist should master.

The Skills of the Data Scientist

Programming

Data scientists deal with datasets that are far too large and complicated to open in Excel. Rather than limit themselves to tabs and sheets, data scientists use programming to work with whole databases that they manipulate to glean usable information.

Data Analysis

There are two general ways to consider data analysis. You can either start with a problem and analyze data in an attempt to find the solution to that problem, or you can start with massive amounts of data and analyze it in search of specific trends that point to opportunities within the marketplace from which the dataset was derived. After either method, the data has to be cleaned up, formatted, and presented to teams of people in a way that can be understood and used by people who are not data scientists.

Predictive Modeling

Predictive modeling is what separates the data scientist from the data analyst. Data scientists are tasked with predicting the future using data from the past. For example, BuzzFeed wants to predict whether an article will go viral, so it gets a data scientist to look at the available data: past articles that have gone viral, most-searched words, etc. To mine that massive amount of data, the data scientist will use machine learning methods such as regressions, support vector machines, or decision trees to determine what kind of articles BuzzFeed should be writing and what keywords it should include to increase the probability of an article going viral.

The Tools of the Data Scientist

Python & R

Python is a more practical approach to data science and a good language for beginners to learn. Python scripts are generally faster than working in R, and allow data scientists to connect data pipelines with web apps and frameworks used in modern production. R is more traditional and offers many niche models, but Python is better supported and has the benefit of scale.

Together, Python and R allow data scientists to build and automate much of their analysis. Python and R have functions and libraries that can run mathematical calculations on data to build descriptive and predictive models. Data scientists use Python and R to run, share, and distribute their work among colleagues and companies. For example, if a company is trying to predict the sales cycle of a product, it can use Python + data science methods to sort and filter incoming data, build an algorithmic model, and generate actionable insights.

SQL

SQL stands for “Structured Query Language,” and it’s the tool of choice for data analysis. Data scientists use SQL to organize their databases and pull specific subsets of data for analysis and modeling. While there are many types of databases — including some that don’t use SQL — SQL databases are by far the most common. SQL syntax is also the foundation for many of the tools used to work with “big data” systems like Hadoop.

Together, SQL, R, and Python give data scientists the power to acquire, sort, and mine data in order to build powerful predictive models.

McKinsey says that data science will become “a key basis of competition.” The data scientist needs the skills and the tools of the trade in order to glean valuable insights from huge amounts of data. Those insights lead to new products, medical solutions, product improvements, and sometimes even new market categories.

Take your data skills to the next level with our full-time Data Science Immersive and part-time Data Science and Data Analytics courses on our global campuses, or learn Data Analysis part-time online.

Explore Data Science Courses at GA

Disclaimer: General Assembly referred to their Bootcamps and Short Courses as “Immersive” and “Part-time” courses respectfully and you may see that reference in posts prior to 2023.