data analysis Tag Archives - General Assembly Blog

5 Key Excel Skills You Can Learn in Minutes

By

Since it was created in 1985, Excel has practically become synonymous with data itself, and still is many years later. Spend a few minutes with our expert instructor in the videos below to learn the kinds of Excel tools that can help you be your own analyst—and make smarter decisions with data. 

How to Create an Excel Bar Chart

Bar charts are an important visual tool that can help express your data over time and tell a story in a visually appealing and digestible way. Learn more in our 2-minute lesson below:

How To Create an Excel Pivot Table

Pivot tables allow you to effectively summarize and highlight the importance of your data sets. They are an important presentation tool and can help you simplify your data. Learn more in our 3-minute lesson below:

How To Create a Histogram in Excel

Histograms provide a visual representation of variations within your data and can help display degrees of difference in an impactful way. Learn more in our 2.5-minute lesson below:

How To Create a Pie Chart in Excel

Pie charts can express percentages of a whole and represents a set period of time and can be helpful to show differences among a handful of categories. Unlike bar charts, it does not express changes over time. Learn more in our 2.5-minute lesson below:

How To Create a VLookup in Excel

A VLookup (vertical lookup) can help you lookup data that is organized vertically. It is useful in helping you spot trends and find important pieces of data that can be difficult to locate in large data sets. Learn more in our 2.5-minute lesson below:

View Upcoming Data Workshops

A Beginner’s Guide To Tableau

By

Featuring Insights From Iun Chen & Vish Srivastava

Read: 2 Minutes

Tableau is a powerful data analysis and data visualization tool that anyone can use. It can be used by beginners to create simple charts and by advanced practitioners to solve complex business problems. It is user-friendly, easy to learn quickly, and includes a portfolio of business intelligence tools with the potential to give a wide range of roles the advantage of professionally analyzing data.

Simply put, if you can present data in a clear, compelling format, you gain a competitive advantage in today’s data-driven marketplace.

“Tableau enables you to quickly connect disparate data sources and utilize a drag-and-drop interface to analyze data and create dashboards,” says Vish Srivastava, who leads our Data Visualization & Intro to Tableau workshop. As a product leader at Evidation Health, he relies on Tableau to turn around fast data analysis. “For example, product teams use it to analyze user growth and analytics, BizOps teams use it to analyze operational data, and sales teams use it to analyze customer and revenue data.”

Businesses survive and thrive on data. The amount of data available to businesses today is impressive. To keep organizations on a successful path, analysts need to provide the key insights needed to make important decisions.

Here’s where Tableau comes in.

Tableau takes business intelligence to the next level, making it fast and efficient to analyze large amounts of data and create beautiful, presentation-ready visualizations that generate insights.

Data is the lifeblood of modern teams. Being able to quickly answer ad hoc questions and integrate data analysis into your day-to-day decision-making will make you an MVP. Though not all data analysts use Tableau, they do need some way to quickly create data visualizations.

Tableau is the data viz tool of choice.

Tableau is so popular in part because it is easy and fast to learn. In Iun Chen’s Intro to Data Analytics course, students learn the life-changing basics of Tableau in an afternoon. Aspiring analysts come to understand the power of data and the impact their numbers can have. As more data becomes available, there are more opportunities for data to be misused, a risk that every data scientist soon realizes. To quote the Nobel laureate and economist Ronald Coase, “If you torture the data long enough, it will confess.”

The ethics of data form the foundation of Chen’s syllabus so pitfalls are avoided from the start. “Overanalyzing and manipulating data too deeply can always give you the information you want,” says Chen. “Unfortunately, this is all too common in professional settings, though it’s usually unintentional.”

Tableau is a powerful tool.

Business insights are only as good as the data behind them, and the best data analysts understand that the human choices they make matter.

“Data is the perfect example of garbage in, garbage out,” says Srivastava, who defines good data as data that is ethically collected, complete, objective, and thoroughly analyzed. ”The double-edged sword of using powerful data analysis and visualization tools is that beautiful charts can create a false precision and obfuscate data integrity issues.”

To delve deeper into this topic, Chen recommends How Charts Lie, by Alberto Cairo, an exploration of how data can be altered:

“This book details how the use of data and data visualizations in journalism can be distorted and misleading, without the audience even realizing it, due to the urgency to present findings in a timely manner to the public.”

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Vish? https://www.linkedin.com/in/vishrutps

7 Tips to Learn Tableau Fast

By

Featuring Insights From Iun Chen & Vish Srivastava

Read: 2 Minutes

Let’s get it straight: How difficult is it to learn Tableau for a complete beginner? Are there shortcuts to learning Tableau? Any tips, tricks, or time-saving work-arounds? Thankfully, the answer is yes. Try these top tips, approved by our expert instructors, and start data viz now.

“It’s a little overwhelming at first but as soon as you understand the basics, like what are dimensions and measures, everything falls into place pretty quickly,” says Vish Srivastava, product leader at Evidation Health and GA instructor.

“In essence, you need to understand two things: The basics on how data works — for example, what are common formats of data and what is a primary key? And a basic understanding of data visualization in a business setting. Can you answer the question: When is a time series vs. a pie chart valuable for decision making?”

But can you really learn the basics of Tableau in an afternoon?

“The best way to learn is to download a sample dataset and dive right in and start creating data visualizations. To keep going from there, check out various portfolios online to get inspiration, and try to build those.”

According to Iun Chen, who conducts internal Tableau training at LinkedIn, Tableau is easy to learn, but hard to master.

“The basic concepts of charting and color theory are easy to pick up and can take just a few weeks. However, if you are looking to be a subject matter expert, this can take years to perfect,” she says. 

Chen preps students in her Intro to Data Analytics course to achieve close-to-mastery in these key areas.

  1. Can they quickly prep and analyze large volumes of data?
  2. Identify key information and determine the best visual method to present them?
  3. Take business questions and determine which visualizations to use?
  4. Translate raw datasets to storylines with a beginning, middle, and end? 
  5. Format charts, graphs, titles, text, and images for a polished deliverable? 
  6. Articulate best practices on design and visualization techniques?
  7. Provide feedback on ineffective visualizations and how to improve them?

    This checklist is the closest thing to a Tableau cheat sheet you’ll find. Prioritize these skills, and you’ll waste no time learning Tableau. Now that you know what you need to succeed, you can choose whether to take our Data Analytics course fast or slow. Learn Tableau — along with data analytics tools SQL and Excel — in a 1-week accelerated format, or over 10 weeks in the evening.

Chen sums it up perfectly: “As long as you are actively learning, applying your learnings, and ensuring innovation of your work, you will be a data visualization expert in no time.”

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Vish? https://www.linkedin.com/in/vishrutps

Top 3 Reasons To Learn Tableau

By

Featuring Insights From GA instructor Candace Pereira-Roberts

Read: 2 Minutes

Do you communicate data? Do you want to create more effective data visualizations? Tableau is the data analytics tool you’re looking for. Here are the top three reasons why you should learn how to use Tableau, the popular data viz software focused on business intelligence. Read on for the advantages of being a Tableau professional.

#1 Tableau Is Easy

Data can be complicated. Tableau makes it easy. Tableau is a data visualization tool that takes data and presents it in a user-friendly format of charts and graphs. And here’s the rub: There is no code writing required. You’ll easily master the end-to-end cycle of data analytics.


Need to showcase trends or surface findings? Tableau will make you an expert. Proficiency in business intelligence is a transferable skill that is quickly becoming the lifeblood of organizations. 

“I see students who are new to analytics learn Tableau desktop and be able to develop Tableau worksheets, interactive dashboards, and story points in a couple of weeks — essentially a complete data analysis project,” says Candace Pereira-Roberts, FinServ data engineer and one of our Data Analytics course instructors. She adds, “I like to share knowledge and watch people grow. I learn from my students as well.” 

 #2 Tableau Is Tremendously Useful

Would you rather tell visual stories with data? Or present the same old boring reports and tables? Is that even a question?

“Anyone who works in data should learn tools that help tell data stories with quality visual analytics.” Full stop.

The smart data analyst, data scientist, and data engineer were quick to adopt and use Tableau tool by tool, and it has given those roles a key competitive advantage in the recent data-related hiring frenzy. But their secret is out. And the advantages go beyond the usual tech roles. Having a working knowledge of data, and specifically knowing how to use Tableau, can help many more tech professionals become more attractive to recruiters and hiring managers.

Plus, it has a built-in career boost. Tableau’s visualizations are so elegant, you’ll be confident presenting the business intelligence and actionable insights to key stakeholders. Improving your presentation skills is par for the course.

#3 Tableau Data Analysts Are in Demand

As more and more businesses discover the value of data, the demand for analysts is growing. One advantage of Tableau is that it is so visually pleasing and easy for busy executives — and even the tech-averse — to use and understand. Tableau presents complicated and sophisticated data in a simple visualization format. In other words, CEOs love it.

Think of Tableau as your secret weapon. Once you learn it, you can easily surface critical information to stakeholders in a visually compelling format. That will make you a rockstar in any organization. 

“Tableau helps organizations leverage business intelligence to become more data-driven in their decision-making process.” Pereira-Roberts says. She recommends participating in Makeover Monday to take your skills to an even higher level. 

Take Our Free 2-Hour Data Visualization Class

Want to learn more about Candace? Check out her thoughts on how to become a business intelligence analyst, or connect with her on LinkedIn.

What Is Data Visualization?

By

An Interview With Iun Chen

Read: 4 Minutes

Data is big, and it’s getting bigger. How do you parse and understand data when the sheer amount of information can be overwhelming? The answer is data visualization. Using concepts of design theory like elements of color and layout, the discipline of data visualization, or data viz, is essentially the graphic representation of data. We called on one of our data viz experts, Iun Chen, to break it down further. 

Let’s start with an introduction and how you came to the world of data viz.

IC: I’m Iun (pronounced ‘yoon’), and I work in the data analytics space focusing on business intelligence tools and building scalable resources for LinkedIn. I also teach the 10-week Intro to Data Analytics course for GA, which includes the professional skills of SQL, Tableau, and Excel.

In college, I was a business major with a specialization in marketing and advertising. I became more interested in how the ad business model worked behind the scenes and in how software and systems worked. As a result, I worked at many major media companies in a quantitative capacity — revenue planning, ad pricing, finance, ad sales strategy. That led me into a formalized analytics route.

How do you define data visualization?

IC: Data visualization is the idea of communicating information graphically. It’s the science of information design, in which you take massive amounts of data in whatever format it comes in and use it to surface high-level insights and findings in a visually compelling way so audiences can easily understand the main points.

How does data visualization differ from data analytics?

IC: Data analytics is the process of cleaning, prepping, analyzing, and presenting data. Data visualization is part of the presenting data step and is defined as the act of visually organizing data through the use of charts, graphs, and dashboards. Concepts of data visualization are closely aligned with concepts of design theory: color, font, scale, layout, organization.

Why is data viz important?

IC: Data visualization is easy to learn but hard to master. In my classes, I heavily emphasize the design element of data visualization. It’s easy to whip together a quick bar or pie chart, but is it the best way to communicate the point you are trying to make? The goal of collecting mass amounts of data is to be able to quickly translate it into insights that can help make smart business decisions. The final form of this translation is often a chart or graph, which is why the ability to design and visualize these mass amounts of data grows as we collect more of it.

What is a data narrative?

IC: People think in stories and narratives, not in black and white figures. Just like you would share a story with a friend using a beginning, middle, and endpoint, you would do the same when sharing details about data analysis. Here’s a simple example.

  1. Beginning: Sales are down year-over-year; identify the symptoms.
  2. Middle: Furniture sales — our largest segment — are doing poorly in the last six months; conduct the analysis to investigate reasons and uncover root causes.
  3. End: Review retail store reports and conduct manufacturer visits; recommend next steps.

The key point to any data narrative is that it should present a compelling business case and surface unrealized insights to the audience. The business challenges, rationale, and next steps should be clearly presented, and people in the room should be able to walk away and know what to action on. 

Which tech roles use data visualization?

Data visualization — like data analytics — is a skill set that can be applied to any job. But if you are looking for a job that has data visualization skills as part of the function and responsibilities, look for roles like business analyst, data analyst, business intelligence analyst, data scientist, and data engineer. Keep in mind that the formal skill of data visualization is still relatively new, so depending on the maturity of the company, those functions may not be fully established yet. However, with the increase of data in the world, there’s a growing need for experts who understand data visualization techniques more and more.           

Check out this Medium post which details how Spotify’s business has evolved with the creation of their data visualization roles.

What’s the future of data visualization?

As we continue to collect more and more data, the need for people with the skills to analyze and present data becomes ever-growing and critical in the workplace environment. More companies will need to generate insights quickly to keep up with advances and competition in their respective industries. The skill of data visualization will become more and more attractive as teams and organizations seek to translate their data into insights more efficiently and effectively. The ability to work with data is increasingly critical to the success of any company in any job function. 

Iun Chen’s Recommended Data Viz Reading List

FlowingData

StorytellingWithData

InformationIsBeautiful

Tableau Public Gallery

New York Times Data Journalism

The WSJ Guide to Information Graphics

Storytelling with Data: A Data Visualization Guide for Business Professionals 

Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations

Edward Tufte’s The Visual Display of Quantitative Information

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen

How to Become a Data Analyst

By

Featuring Insights From Matt Brems & Vish Srivastava

Read: 4 Minutes

So, you want to be a data analyst? GA instructors Matt Brems and Vish Srivastava are data experts with deep experience across a wide range of industries. If anyone can start you on the path to your dream job of data analyst, it’s these two. Read on for advice and guidance on how to get there, what it takes and where to look for the jobs of the future.

Tell us about your experience in data — what brought you to the field?

Matt Brems: I was attracted to statistics and data science because I felt that too many decisions were made based on a gut feeling instead of data. In my experience, that usually meant that the loudest person in the room would make the decisions. Using statistics and data science to analyze evidence helps us to make better, more informed decisions that are less susceptible to bias.

Vish Srivastava: As a product manager, I’m faced with a deluge of data and need to make sense of what matters and how to use data to inform decision-making around the product. Quickly visualizing data and creating dashboards that get widely distributed are both critical skills to be an effective product manager.

What qualifications do you need to be considered for a data analyst job?

MB: Jobs that involve data analysis have many different titles. Some companies call these “data analysts,” other terms include “business intelligence analyst,” “marketing analyst,” “data scientist,” and others. Since different companies will have different names for the job, it’s not surprising that the qualifications for a data analyst (or similar) role will vary wildly from company to company.

The most common qualification is to know SQL, or Structured Query Language.

Most data analyst roles will expect some experience with data visualization. Having some background in statistics — even one or two courses at the college level or having online certifications — is often expected.

What traits make a data analyst successful?

VS: Simplicity, integrity, empathy, and patience. Simplicity because it can be very easy to go crazy and complicate an issue when you are in the weeds. Data analysis must always resist this urge and instead create clarity and complicity for stakeholders.

Integrity, because data analysts make a lot of crucial decisions like which outlier to remove and which insights to show vs. which ones to leave behind. You can easily tell a story that isn’t really true if you wanted to, and data analysts must hold themselves accountable to the truth. Mark Twain wasn’t kidding when he said that “there are lies, damned lies, and statistics.”

Empathy, because data analysts must always look at deliverables from the perspective of an audience. Will this be helpful to them, is it immediately legible, what questions will they have that you can preemptively address?

And lastly, patience. Data analysis involves a thankless job that people don’t really talk about — data cleaning!

How does someone start a career in data analytics

MB: I would start by searching data analyst roles at companies in which you’re interested. If you have most of the qualifications listed, go ahead and start applying! If you feel like you lack most of the qualifications, start exploring resources and courses that can help you close that gap. The three most important skills to know are likely to be SQL, statistics, and experience with a data visualization tool. There are lots of tutorials and courses available to teach you each of these.

How long does it take to become a data analyst?

MB: Data analysts are, in many cases, entry-level roles. New graduates from bootcamps or from college or university can often be accepted to data analyst roles. Some roles may require up to a few years of experience.

What’s the next big disruption? Where should candidates look for the jobs of the future?

VS: Every sector has been transformed by data, and this will continue to happen. But I think a very interesting one to watch is healthcare. Data in healthcare is trapped in various silos, like hospital systems (e.g., EHRs), insurance companies (e.g., claims data), clinical data (e.g., lab tests), and even personal health and fitness data (e.g., Apple Watch). This fragmentation of data, along with various pieces of regulation that govern how and when data is shared and used, means there is so much value that is currently untapped. As the industry moves to more interoperability and hopefully does so in a way that respects patient privacy and patient safety, we will see new opportunities quickly emerge.

Matt Brems teaches our Data Science Immersive, a bootcamp where students become fully-fledged data scientists in 12 weeks. He runs the consultancy BetaVector, where he solves data problems with Fortune 500 companies and startups alike. 

Vish Srivastava teaches our Data Analytics course. He has led multidisciplinary teams across many different tech sectors. He is currently a product leader at Evidation Health and is obsessed with building products to make the world a better place. 

Tableau vs. Power BI

By

Featuring Insights From Matt Brems

Read: 2 Minutes

Tableau and Power BI are powerful tools for business intelligence, with capabilities to take loads of big data and create elegant visualizations that convey key insights to stakeholders in easily digestible presentations. Both help organizations leverage business intelligence to become more data-driven in their decision-making process. So which tool is better? We asked a few industry experts their thoughts on the data analysis tools Tableau and Power BI. Here’s what they had to say.

Candace Pereira-Roberts, Data Engineer & GA Data Analytics Instructor

“Anyone who works in data should learn tools that help tell data stories with quality visualizations. Tableau is a wonderful tool for the technical and nontechnical to build these visualizations. I love how we teach the Tableau unit in the Data Analytics bootcamp. I see students who are new to analytics learn Tableau desktop and be able to develop Tableau worksheets, dashboards, and story points in a couple of weeks to do a complete analysis project.”

Iun Chen, GA Instructor & Data Analyst at LinkedIn 

“In my professional capacity, I lead data visualization workshops to share best practices on charting and design theory, with a focus on Tableau. But with the growth of big data analytics, there are more players in the data viz space. Looker. Qlik, Domo, and Microstrategy are a few with out-of-the-box solutions. Check out other marketplace BI and analytics leaders and their reviews at Gartner.

Alternatively, if you are up for the challenge you can start from scratch and build out completely customized solutions through coding packages, such as with Python plotting libraries Matplotlib, Pandas, and Seaborn.”

Matt Brems, GA Instructor & Data Consultant at BetaVector 

“Most data analyst roles will expect some experience with data visualization. They may prefer your visualization experience be tied to a certain tool like Tableau or Power BI or simply want you to have experience designing graphics or dashboards. As with any platform, the human element is key. A good data analyst is curious and detail-oriented. Diving into the data and spotting anomalies or identifying patterns requires curiosity. Looking at large datasets for long periods of time can invite mistakes, so being detail-oriented ensures you’re interpreting the data correctly.” 

Vish Srivastava, GA Instructor & Product Leader at Evidation Health

 “Most teams I’ve seen are not comparing Tableau and Power BI. Instead, it’s more about whether to adopt a business intelligence tool at all, or whether to use Tableau or Power BI in place of Excel. Tableau is a great option when you need to quickly create data visualizations.Tableau is incredibly powerful because it’s designed for nontechnical users, meaning business users can set up and tweak dashboards and charts without the support of engineering or data science teams.”

When it comes to research, the most common data analytics tool is SQL — no surprise there. But once you get into more niche industries, that can vary, says Brems.

“In academia, R is probably the most prevalent data analysis tool, though Python is quickly gaining popularity. SAS and Stata are often used in specific industries, though their popularity is diminishing. (R and Python are open source tools, which means, among other things, that they are free.)”

Want to learn more about Candace?
https://www.coursereport.com/blog/how-to-become-a-business-intelligence-analyst
https://generalassemb.ly/instructors/candace-roberts/13840
www.linkedin.com/in/candaceproberts

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Matt?
https://betavector.com/
https://www.linkedin.com/in/matthewbrems

Want to learn more about Vish?
 https://www.linkedin.com/in/vishrutps

Today’s Best Data Analytics Tools

By

Featuring Insights From Matt Brems

Read: 3 Minutes

Our Data-Driven World

We live in a world of data — swimming in statistics, numbers, information — and the amount of data seems to be growing faster than we can keep up. More people are using data points to make decisions large and small. From which restaurant has the highest Yelp rating to which city has the lowest rates of COVID-19, using data to navigate everyday life is now the norm. Indeed, the pandemic has only increased our reliance on data. We have come to expect this tsunami of data to explain, and in some cases solve, many of the most vexing problems faced by society today. But finding key insights takes careful analysis of a staggering amount of data. No small feat.

It’s true that more data is released than ever before. In the U.S., there are currently over 290,000 datasets on data.gov alone. Clearly, there’s a growing need for data analysts and the data analytics tools that help us understand these numbers. From small businesses to the highest levels of governments, decisions turn on interpretations of data. Big data can have big consequences.
 

So how do data analysts find the insights lurking in a database? And what are the best tools to analyze all those numbers? Read on to discover the best data analytics tools in the market.

Data scientist and GA instructor since 2016, Matt Brems currently runs a data science consultancy called BetaVector. We asked him to share his go-to data analysis tools. “People who want to analyze data use many different tools; I like to break these down into three different types,” he says.

Let’s get to it.

Type #1: Tabular Data Tools

Data analysts need to get data out of databases and analyze that information. And to do that, they use tabular data tools. According to Brems, the most important ones to know are Microsoft Excel, Google Sheets, and SQL, or Structured Query Language. Generally considered the best data analysis tool for research, SQL is the most common qualification found in job descriptions for a data analyst.

“Most data that data analysts analyze comes in the form of a table, called tabular data. This just means that data is organized into rows and columns, like a spreadsheet. Most data analysts will use a spreadsheet tool like Microsoft Excel or Google Sheets. When working with significant amounts of data (large tables, many tables, or both), organizations will often use a database. In order to interact with most databases, SQL is by far the language of choice.”

Type #2: Programming Language Tools

Proficiency in a few programming tools, while not a prerequisite for basic data analysis, can give analysts the ability to perform a wide variety of tasks. While the needed programming language tools will vary from company to company and even from job to job, having this skill set as a data analyst is clearly an advantage for job seekers.

“Python and R are the most common programming language tools in data analysis, though Stata and SAS are also used in some industries. These tools can be used to perform automation, statistical modeling, forecasting, and visualization.”

Type #3: Data Visualization Tools

Since data analysts are frequently tasked with presenting results to stakeholders, a good data visualization tool is essential. Brems recommends Tableau and Microsoft PowerBI.

“While you can visualize data using programming languages, Tableau and PowerBI are two standalone tools that are used almost exclusively for the purposes of building static data visualizations and dashboards.”

A Note on Research 

When it comes to research, the most common data analytics tool is SQL — no surprise there. But once you get into more niche industries, that can vary, says Brems.

“In academia, R is probably the most prevalent data analysis tool, though Python is quickly gaining popularity. SAS and Stata are often used in specific industries, though their popularity is diminishing. (R and Python are open source tools, which means, among other things, that they are free.)”

Want to learn more about Matt?

https://betavector.com/

https://www.linkedin.com/in/matthewbrems

5 High-Paying Careers That Require Data Analysis Skills

By

Data-Driven-UX-Design

The term “big data” is everywhere these days, and with good reason. More products than ever before are connected to the Internet: phones, music players, DVRs, TVs, watches, video cameras…you name it. Almost every new electronic device created today is connected to the Internet in some way for some purpose.

The result of all those things connected to the Internet is data. Big, big data. What’s that mean for you? Simply put, it means if you can quickly, accurately, and intelligently sift through data and find trends, you are extremely valuable in today’s tech job market. More specifically, here are five job titles that require data analytics skills and expertise to get ahead. 

Continue reading

How is Python Used in Data Science?

By

Python is a popular programming language used by both developers and data scientists. But what makes it so popular and why are so many data scientists choosing Python over other programming languages? In this article, we’ll explore the advantages of Python programming and why it’s useful for data science.

What is Python?

No, we’re not talking about the giant, tropical snake. Python is a general-purpose, high-level programming language. It supports object oriented, structured, and functional programming paradigms.

Python was created in the late 1980s by the Dutch programmer Guido van Rossum who wanted a project to fill his time over the holiday break. His goal was to create a programming language that was a descendant of the ABC programming language but would appeal to Unix/C hackers. Van Rossum writes that he chose the name Python for this language, “being in a slightly irreverent mood (and a big fan of Monty Python’s Flying Circus).”

Python went through many updates and iterations and by the year 2008, Python 3.0 was released. This was designed to fix many of the design flaws in the language, with an emphasis on removing redundant features. While this update had some growing pains as it was not backwards compatible, the new updates made way for Python as we know it today. It continues to be well-maintained and supported as a popular, open source programming language.

In “The Zen of Python,” developer Tim Peters summarizes van Rossum’s guiding principles for writing code in Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

These principles touch on some of the advantages of Python in data science. Python is designed to be readable, simple, explicit, and explainable. Even the first principle states that Python code should be beautiful. In general, Python is a great programming language for many tasks and is becoming increasingly popular for developers. But now you may be wondering, why learn Python for data science?

Why Python for Data Science?

The first of many benefits of Python in data science is its simplicity. While some data scientists come from a computer science background or know other programming languages, many come from backgrounds in statistics, mathematics, or other technical fields and may not have as much coding experience when they enter the field of data science. Python syntax is easy to follow and write, which makes it a simple programming language to get started with and learn quickly. 

In addition, there are plenty of free resources available online to learn Python and get help if you get stuck. Python is an open source language, meaning the language is open to the public and freely available. This is beneficial for data scientists looking to learn a new language because there is no up-front cost to start learning Python. This also means that there are a lot of data scientists already using Python, so there is a strong community of both developers and data scientists who use and love Python.

The Python community is large, thriving, and welcoming. Python is the fourth most popular language among all developers based on a 2020 Stack Overflow survey of nearly 65,000 developers. Python is especially popular among data scientists. According to SlashData, there are 8.2 million active Python users with “a whopping 69% of machine learning developers and data scientists now us[ing] Python (compared to 24% of them using R).”4 A large community brings a wealth of available resources to Python users. Not only are there numerous books and tutorials available, there are also conferences such as PyCon where Python users across the world can come together to share knowledge and connect. Python has created a supportive and welcoming community of data scientists willing to share new ideas and help one another. 

If the sheer number of people using Python doesn’t convince you of the importance of Python for data science, maybe the libraries available to make your data science coding easier will. A library in Python is a collection of modules with pre-built code to help with common tasks. They essentially allow us to benefit from and build on top of the work of others. In other languages, some data science tasks would be cumbersome and time consuming to code from scratch. There are countless libraries like NumPy, Pandas, and Matplotlib available in Python to make data cleaning, data analysis, data visualization, and machine learning tasks easier. Some of the most popular libraries include:

  • NumPy: NumPy is a Python library that provides support for many mathematical tasks on large, multidimensional arrays and matrices.
  • Pandas: The Pandas library is one of the most popular and easy-to-use libraries available. It allows for easy manipulation of tabular data for data cleaning and data analysis.
  • Matplotlib: This library provides simple ways to create static or interactive boxplots, scatterplots, line graphs, and bar charts. It’s useful for simplifying your data visualization tasks.
  • Seaborn: Seaborn is another data visualization library built on top of Matplotlib that allows for visually appealing statistical graphs. It allows you to easily visualize beautiful confidence intervals, distributions, and other graphs.
  • Statsmodels: This statistical modeling library builds all of your statistical models and statistical tests including linear regression, generalized linear models, and time series analysis models.
  • Scipy: Scipy is a library used for scientific computing that helps with linear algebra, optimization, and statistical tasks.
  • Requests: This is a useful library for scraping data from websites. It provides a user-friendly and responsive way to configure HTTP requests.

In addition to all of the general data manipulation libraries available in Python, a major advantage of Python in data science is the availability of powerful machine learning libraries. These machine learning libraries make data scientists’ lives easier by providing robust, open source libraries for any machine learning algorithm desired. These libraries offer simplicity without sacrificing performance. You can easily build a powerful and accurate neural network using these frameworks. Some of the most popular machine learning and deep learning libraries in Python include:

  • Scikit-learn: This popular machine learning library is a one-stop-shop for all of your machine learning needs with support for both supervised and unsupervised tasks. Some of the machine learning algorithms available are logistic regression, k-nearest neighbors, support vector machine, random forest, gradient boosting, k-means, DBSCAN, and principal component analysis.
  • Tensorflow: Tensorflow is a high-level library for building neural networks. Since it was mostly written in C++, this library provides us with the simplicity of Python without sacrificing power and performance. However, working with raw Tensorflow is not suited for beginners.
  • Keras: Keras is a popular high-level API that acts as an interface for the Tensorflow library. It’s a tool for building neural networks using a Tensorflow backend that’s extremely user friendly and easy to get started with.
  • Pytorch: Pytorch is another framework for deep learning created by Facebook’s AI research group. It provides more flexibility and speed than Keras, but since it has a low-level API, it is more complex and may be a little bit less beginner friendly than Keras. 

What Other Programming Languages are Used for Data Science?

Python is the most popular programming language for data science. If you’re looking for a new job as a data scientist, you’ll find that Python is also required in most job postings for data science roles. Jeff Hale, a General Assembly data science instructor, scraped job postings from popular job posting sites to see what was required for jobs with the title of “Data Scientist.” Hale found that Python appears in nearly 75% of all job postings. Python libraries including Tensorflow, Scikit-learn, Pandas, Keras, Pytorch, and Numpy also appear in many data science job postings.

Image source: The Most In-Demand Tech Skills for Data Scientists by Jeff Hale

R, another popular programming language for data science, appeared in roughly 55% of the job postings. While R is a useful tool for data science and has many benefits including data cleaning, data visualization, and statistical analysis, Python continues to become more popular and preferred among data scientists for a majority of tasks. In fact, the average percentage of job postings requiring R dropped by about 7% between 2018 and 2019, while Python increased in the percentage of job postings requiring the language. This isn’t to say that learning R is a waste of time; data scientists that know both of these languages can benefit from the strengths of both languages for different purposes. However, since Python is becoming increasingly popular, there’s a high chance that your team uses Python, and it’s important to use the language that your team is comfortable with and prefers.

What is the Future of Python for Data Science?

As Python continues to grow in popularity and as the number of data scientists continues to increase, the use of Python for data science will inevitably continue to grow. As we advance machine learning, deep learning, and other data science tasks, we’ll likely see these advancements available for our use as libraries in Python. Python has been well-maintained and continuously growing in popularity for years, and many of the top companies use Python today. With its continued popularity and growing support, Python will be used in the industry for years to come.

Whether you’ve been a data scientist for years or you are just beginning your data science journey, you can benefit from learning Python for data science. The simplicity, readability, support, community, and popularity of the language — as well as the libraries available for data cleaning, visualization, and machine learning — all set Python apart from other programming languages. If you aren’t already using Python for your work, give it a try and see how it can simplify your data science workflow.

Explore Data Workshops