Noelle Brown, Author at General Assembly Blog

Computer Science vs. Data Science: What is the Difference?

By

Maybe you want to learn more about data science since you’ve heard it’s “the sexiest job of the 21st century.” Or maybe your software engineer friend is trying to talk you into learning computer science. Either way, both data science and computer science skills are in demand. In this article, we will cover the major differences between data science and computer science to clarify the distinction between these two fields.

Before we dive into the differences, let’s define these two sciences:

Data Science vs. Computer Science

Data science is an interdisciplinary field that uses data to extract insights and inform decisions. It’s often referred to as a combination of statistics, business acumen, and computer science. Data scientists clean, explore, analyze, and model with data using programming languages such as Python and R along with techniques such as statistical models, machine learning, and deep learning.

While it’s one part of data science, computer science is its own broader field of study involving a range of both theoretical and practical topics like data structures and algorithms, hardware and software, and information processing. It has many applications in fields like machine learning, software engineering, and mathematics.

History

While many of the topics used in data science have been around for a while, data science as a field is in its infancy. In 1974, Peter Naur defined the term “data science” in his work, Concise Survey of Computer Methods. However, even Naur couldn’t have predicted the vast amount of data that our modern world would generate on a daily basis only a few decades later. It wasn’t until the early 2000s that data science was recognized as its own field. It gained popularity in the early 2010s, leading to the field as we know it today — a blend of statistics and computer science to drive insights and make data-driven business decisions. “Data science,” “big data,” “artificial intelligence,” “machine learning,” and “deep learning” have all become buzzwords in today’s world. These are all components of data science and while trendy, they can provide practical benefits to companies. Historically, we did not have the storage capacity to hold the amount of data that we are able to collect and store today. This is one reason that data science has become a popular field only recently. The emergence of big data and the advancements in technology have paved the way for individuals and businesses to harness the power of data. While many of the tools that data scientists use have been around for many years, we have not had the software or hardware requirements to make use of these tools until recently.

Computer science, on the other hand, has been a field of study for centuries. This is one of the main differences between it and data science. Ada Lovelace is known for pioneering the field of computer science as the person who wrote the first computer algorithm in the 1840s. However, computing devices such as the abacus date back thousands of years. Computer science is a topic that has been formally researched for much longer than data science, and companies have been using computer science tools for decades. It’s an umbrella field that has numerous subdomains and applications. 

Applications

The applications of each of these fields in the industry differs as well. Computer science skills are used in many different jobs including that of a data scientist. However, common roles involving computer science skills include software engineers, computer engineers, software developers, and web developers. Two roles that use computer science, front end engineer and Java developer, ranked first and second respectively on Glassdoor’s 50 Best Jobs in America for 2020 list. While these roles do not formally require degrees, many people in these jobs hold a degree or come from a background in computer science. 

Common computer science job tasks include writing, testing, and debugging code, developing software, and designing applications. Individuals that use computer science in their roles often create new software and web applications. They need to have excellent problem solving skills and be able to write code in programming languages such as Python, Ruby, JavaScript, Java, or C#. They also need to have a fundamental understanding of how these languages work, and be well-versed in object oriented programming.

Data science is applied in job titles such as data scientist, data analyst, machine learning engineer, and data engineer. Data scientist and data engineer ranked third and sixth respectively on Glassdoor’s 50 Best Jobs in America for 2020. Individuals in these roles come from a variety of backgrounds including computer science, statistics, and mathematics. 

Common data science job tasks include cleaning and exploring data, extracting insights from data, and building and optimizing models. Data scientists analyze and reach conclusions based on data. They need to be well versed in statistics and mathematics topics including linear algebra and calculus as well as programming languages such as Python, R, and SQL. They also need to have excellent communication skills as they are often presenting insights, data visualizations, and recommendations to stakeholders.

Since computer science is one component of data science, there is often crossover in these roles and responsibilities. For example, computer science tasks like programming and debugging are used in both computer science jobs and data science jobs. Both of these fields are highly technical and require knowledge of data structures and algorithms. However, the depth of this knowledge required for computer science vs. data science varies. It’s often said that data scientists know more about statistics than a computer scientist but more about computer science than a statistician. This reinforces the interdisciplinary nature of data science.

The Use of Data

Data, or information such as numbers, text, and images, has applications in both computer science and data science. The study and use of data structures is a topic in computer science. Data structures are ways to organize, manage, and store data in ways that it can be used efficiently; a sub-domain of computer science, it allows us to store and access data in our computer’s memory. Data science benefits from data structures to access data, but the main goal of data science is to analyze and make decisions based on the data, often using statistics and machine learning.

The Future of Computer Science and Data Science

Today, all companies and industries can benefit from both of these fields. Computer scientists drive business value by developing software and tools while data scientists drive business value by answering questions and making decisions based on data. As software continues to integrate with our lives and daily routines, computer science skills will continue to be critical and in demand. As we continue to create and store vast amounts of data on a daily basis, data science skills will also continue to be critical and in demand. Both fields are constantly evolving as technology advances and both computer scientists and data scientists need to stay current with the latest tools, methods, and technologies.

The field of data science would not exist without computer science. Today, the two fields complement each other to further applications of artificial intelligence, machine learning, and personalized recommendations. Many of the luxuries that we have today — a favorite streaming service that recommends new movies, the ability to unlock our phones with facial recognition technology, or virtual home assistants that let us play our favorite music just by speaking — are made possible by computer science and made better by data science. As long as bright, motivated individuals continue to learn data science and computer science, these two fields will continue to advance technology and improve the quality of our lives.

Explore Data Workshops


How is Python Used in Data Science?

By

Python is a popular programming language used by both developers and data scientists. But what makes it so popular and why are so many data scientists choosing Python over other programming languages? In this article, we’ll explore the advantages of Python programming and why it’s useful for data science.

What is Python?

No, we’re not talking about the giant, tropical snake. Python is a general-purpose, high-level programming language. It supports object oriented, structured, and functional programming paradigms.

Python was created in the late 1980s by the Dutch programmer Guido van Rossum who wanted a project to fill his time over the holiday break. His goal was to create a programming language that was a descendant of the ABC programming language but would appeal to Unix/C hackers. Van Rossum writes that he chose the name Python for this language, “being in a slightly irreverent mood (and a big fan of Monty Python’s Flying Circus).”

Python went through many updates and iterations and by the year 2008, Python 3.0 was released. This was designed to fix many of the design flaws in the language, with an emphasis on removing redundant features. While this update had some growing pains as it was not backwards compatible, the new updates made way for Python as we know it today. It continues to be well-maintained and supported as a popular, open source programming language.

In “The Zen of Python,” developer Tim Peters summarizes van Rossum’s guiding principles for writing code in Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

These principles touch on some of the advantages of Python in data science. Python is designed to be readable, simple, explicit, and explainable. Even the first principle states that Python code should be beautiful. In general, Python is a great programming language for many tasks and is becoming increasingly popular for developers. But now you may be wondering, why learn Python for data science?

Why Python for Data Science?

The first of many benefits of Python in data science is its simplicity. While some data scientists come from a computer science background or know other programming languages, many come from backgrounds in statistics, mathematics, or other technical fields and may not have as much coding experience when they enter the field of data science. Python syntax is easy to follow and write, which makes it a simple programming language to get started with and learn quickly. 

In addition, there are plenty of free resources available online to learn Python and get help if you get stuck. Python is an open source language, meaning the language is open to the public and freely available. This is beneficial for data scientists looking to learn a new language because there is no up-front cost to start learning Python. This also means that there are a lot of data scientists already using Python, so there is a strong community of both developers and data scientists who use and love Python.

The Python community is large, thriving, and welcoming. Python is the fourth most popular language among all developers based on a 2020 Stack Overflow survey of nearly 65,000 developers. Python is especially popular among data scientists. According to SlashData, there are 8.2 million active Python users with “a whopping 69% of machine learning developers and data scientists now us[ing] Python (compared to 24% of them using R).”4 A large community brings a wealth of available resources to Python users. Not only are there numerous books and tutorials available, there are also conferences such as PyCon where Python users across the world can come together to share knowledge and connect. Python has created a supportive and welcoming community of data scientists willing to share new ideas and help one another. 

If the sheer number of people using Python doesn’t convince you of the importance of Python for data science, maybe the libraries available to make your data science coding easier will. A library in Python is a collection of modules with pre-built code to help with common tasks. They essentially allow us to benefit from and build on top of the work of others. In other languages, some data science tasks would be cumbersome and time consuming to code from scratch. There are countless libraries like NumPy, Pandas, and Matplotlib available in Python to make data cleaning, data analysis, data visualization, and machine learning tasks easier. Some of the most popular libraries include:

  • NumPy: NumPy is a Python library that provides support for many mathematical tasks on large, multidimensional arrays and matrices.
  • Pandas: The Pandas library is one of the most popular and easy-to-use libraries available. It allows for easy manipulation of tabular data for data cleaning and data analysis.
  • Matplotlib: This library provides simple ways to create static or interactive boxplots, scatterplots, line graphs, and bar charts. It’s useful for simplifying your data visualization tasks.
  • Seaborn: Seaborn is another data visualization library built on top of Matplotlib that allows for visually appealing statistical graphs. It allows you to easily visualize beautiful confidence intervals, distributions, and other graphs.
  • Statsmodels: This statistical modeling library builds all of your statistical models and statistical tests including linear regression, generalized linear models, and time series analysis models.
  • Scipy: Scipy is a library used for scientific computing that helps with linear algebra, optimization, and statistical tasks.
  • Requests: This is a useful library for scraping data from websites. It provides a user-friendly and responsive way to configure HTTP requests.

In addition to all of the general data manipulation libraries available in Python, a major advantage of Python in data science is the availability of powerful machine learning libraries. These machine learning libraries make data scientists’ lives easier by providing robust, open source libraries for any machine learning algorithm desired. These libraries offer simplicity without sacrificing performance. You can easily build a powerful and accurate neural network using these frameworks. Some of the most popular machine learning and deep learning libraries in Python include:

  • Scikit-learn: This popular machine learning library is a one-stop-shop for all of your machine learning needs with support for both supervised and unsupervised tasks. Some of the machine learning algorithms available are logistic regression, k-nearest neighbors, support vector machine, random forest, gradient boosting, k-means, DBSCAN, and principal component analysis.
  • Tensorflow: Tensorflow is a high-level library for building neural networks. Since it was mostly written in C++, this library provides us with the simplicity of Python without sacrificing power and performance. However, working with raw Tensorflow is not suited for beginners.
  • Keras: Keras is a popular high-level API that acts as an interface for the Tensorflow library. It’s a tool for building neural networks using a Tensorflow backend that’s extremely user friendly and easy to get started with.
  • Pytorch: Pytorch is another framework for deep learning created by Facebook’s AI research group. It provides more flexibility and speed than Keras, but since it has a low-level API, it is more complex and may be a little bit less beginner friendly than Keras. 

What Other Programming Languages are Used for Data Science?

Python is the most popular programming language for data science. If you’re looking for a new job as a data scientist, you’ll find that Python is also required in most job postings for data science roles. Jeff Hale, a General Assembly data science instructor, scraped job postings from popular job posting sites to see what was required for jobs with the title of “Data Scientist.” Hale found that Python appears in nearly 75% of all job postings. Python libraries including Tensorflow, Scikit-learn, Pandas, Keras, Pytorch, and Numpy also appear in many data science job postings.

Image source: The Most In-Demand Tech Skills for Data Scientists by Jeff Hale

R, another popular programming language for data science, appeared in roughly 55% of the job postings. While R is a useful tool for data science and has many benefits including data cleaning, data visualization, and statistical analysis, Python continues to become more popular and preferred among data scientists for a majority of tasks. In fact, the average percentage of job postings requiring R dropped by about 7% between 2018 and 2019, while Python increased in the percentage of job postings requiring the language. This isn’t to say that learning R is a waste of time; data scientists that know both of these languages can benefit from the strengths of both languages for different purposes. However, since Python is becoming increasingly popular, there’s a high chance that your team uses Python, and it’s important to use the language that your team is comfortable with and prefers.

What is the Future of Python for Data Science?

As Python continues to grow in popularity and as the number of data scientists continues to increase, the use of Python for data science will inevitably continue to grow. As we advance machine learning, deep learning, and other data science tasks, we’ll likely see these advancements available for our use as libraries in Python. Python has been well-maintained and continuously growing in popularity for years, and many of the top companies use Python today. With its continued popularity and growing support, Python will be used in the industry for years to come.
Whether you’ve been a data scientist for years or you are just beginning your data science journey, you can benefit from learning Python for data science. The simplicity, readability, support, community, and popularity of the language — as well as the libraries available for data cleaning, visualization, and machine learning — all set Python apart from other programming languages. If you aren’t already using Python for your work, give it a try and see how it can simplify your data science workflow.

Explore Data Workshops