Designing a Dashboard in Tableau for Business Intelligence

By

Tableau is a data visualization platform that focuses on business intelligence. It has become very popular in recent years because of its flexibility and beautiful visualizations. Clients love the way Tableau presents data and how easy it makes performing analyses. It is one of my favorite analytical tools to work with.

A simple way to define a Tableau dashboard is as a glance view of a company’s key performance indicators, or KPIs. There are different kinds of dashboards available — it all depends on the business questions being asked and the end user. Is this for an operational team (like one at a distribution center) that needs to see the amount of orders by hour and if sales goals are achieved? Or is this for a CEO who would like to measure the productivity of different departments and products against forecast? The first case will require the data to be updated every 10 minutes, almost in real time. The second doesn’t require the same cadence, and once a day will be enough to track the company performance.

Over the past few years, I’ve built many dashboards for different types of users, including department heads, business analysts, and directors, and helped many mid-level managers with data analysis. Here are some best practices for creating Tableau dashboards I’ve learned throughout my career.

First Things First: Why Use a Data Visualization?

A data visualizations tool is one of the the most effective ways to analyze data from any business process (sales, returns, purchase orders, warehouse operation, customer shopping behavior, etc.).

Below we have a grid report and bar chart that contain the same information. Which is easier to interpret?

Grid report

Bar Chart
Grid report vs. bar chart.

That’s right — it’s quicker to identify the category with the lowest sales, Tops, using the chart.

Many companies used to use grid reports to operate and make decisions, and many departments still do today, especially in retail. I once went to a trading meeting on a Monday morning where team members printed pages of Excel reports with rows and rows of sales and stock data by product and took them to a meeting room with a ruler and a highlighter to analyze sales trends. Some of these reports took at least two hours to prepare and required combining data from different data sources with VLOOKUPs — a function that allows users to search through columns in Excel. After the meeting, they threw the papers away (what a waste of paper and ink!) and then the following Monday it all started again.

Wouldn’t it be better to have an effective dashboard and reporting tool in which the company’s KPIs were updated on a daily basis and presented in an interactive dashboard that could be viewed on tablets/laptops and digitally sliced and diced? That’s where tools like Tableau dashboards come in. You can drill down into details and answer questions raised in the meeting in real time — something you couldn’t do with paper copies.

How to Design a Dashboard in Tableau

Step 1: Identify who will use the dashboard and with what frequency.

Tableau dashboards can be used for many different purposes, such as measuring different KPI’s, and therefore will be designed differently for each circumstance. This means that, before you can begin designing a dashboard, you need to know who is going to use it and how often.

Step 2: Define your topic.

The stakeholder (i.e., director, sales manager, CEO, business analyst, buyer) should be able to tell you what kind of business questions need to be answered and the decisions that will be made based on the dashboard.

Here, I am going to use data from a fictional retail company to report on monthly sales.

The commercial director would like to know 1) the countries to which the company’s products have been shipped, 2) which categories are performing well, and 3) sales by product. The option of browsing products is a plus, so the tableau dashboard should include as much detail as possible.

Step 3: Initially, make sure you have all of the necessary data available to answer the questions specified.

Clarify how often you will get the data, the format in which you will receive the data (inside a database or in loose files), the cleanliness of the data, and if there are any data quality issues. You need to evaluate all of this before you promise a delivery date.

Step 4: Create your dashboard.

When it comes to dashboard design, it’s best practice to present data from top to bottom. The story should go from left to right, like a comic book, where you start at the top left and finish at the bottom right.

Let’s start by adding the data set to Tableau. For this demo, the data is contained in an Excel file generated by a software I developed myself. It’s all dummy data.

To connect to an Excel file from Tableau, select “Excel” from the Connect menu. The tables are on separate Excel sheets, so we’re going to use Tableau to join them, as shown in the image below. Once the tables are joined, go to the bottom and select Sheet 1 to create your first visualization.

Excel Sheet in Tableau
Joining Excel sheet in Tableau.

We have two columns in the Order Details table: Quantity and Unit Price. The sales amount is Quantity x Unit Price, so we’re going to create the new metric, “Sales Amount”. Right-click on the measures and select Create > Calculated Field.

Creating a Map in Tableau

We can use maps to visualize data with a geographical component and compare values across geographical regions. To answer our first question — “Which countries the company’s products have been shipped to?” — we’ll create a map view of sales by country.

1. Add Ship Country to the rows and Sales Amount to the columns.

2. Change the view to a map.

Map
Visualizing data across geographical regions.

3. Add Sales Amount to the color pane. Darker colors mean higher sales amounts aggregated by country.

4. You can choose to make the size of the bubbles proportional to the Sales Amount. To do this, drag the Sales Amount measure to the Size area.

5. Finally, rename the sheet “Sales by Country”.

Creating a Bar Chart in Tableau

Now, let’s visualize the second request, “Which categories are performing well?” We’ll need to create a second sheet. The best way to analyze this data is with bar charts, as they are to compare data across categories. Pie charts work in a similar way, but in this case we have too many categories (more than four) so they wouldn’t be effective.

1. To create a bar chart, add Category Name to the rows and Sales Amount to the columns.

2. Change the visualization to a bar chart.

3. Switch columns and rows, sort it by descending order, and show the values so users can see the exact value that the size of the rectangle represents.

4. Drag the category name to “Color”.

5. Now, rename the sheet to “Sales by Category”.

Sales category bar chart
Our Sales by Category breakdown.

Assembling a Dashboard in Tableau

Finally, the commercial director would like to see the details of the products sold by each category.

Our last page will be the product detail page. Add Product Name and Image to the rows and Sales Amount to the columns. Rename the sheet as “Products”.

We are now ready to create our first dashboard! Rearrange the chart on the dashboard so that it appears similar to the example below. To display the images, drag the Web Page object next to the Products grid.

Dashboard Assembly
Assembling our dashboard.

Additional Actions in Tableau

Now, we’re going to add some actions on the dashboard such that, when we click on a country, we’ll see both the categories of products and a list of individual products sold.

1. Go to Dashboard > Actions.

2. Add Action > Filter.

3. Our “Sales by Country” chart is going to filter Sales by Category and Products.

4. Add a second action. Sales by Category will filter Products.

5. Add a third action, this time selecting URL.

6. Select Products, <Image> on URL, and click on the Test Link to test the image’s URL.

What we have now is an interactive dashboard with a worldwide sales view. To analyze a specific country, we click on the corresponding bubble on the map and Sales by Category will be filtered to what was sold in that country.

When we select a category, we can see the list of products sold for that category. And, when we hover on a product, we can see an image of it.

In just a few steps, we have created a simple dashboard from which any head of department would benefit.

Dashboard
The final product.

Dashboards in Tableau at General Assembly

In GA’s Data Analytics course, students get hands-on training with the versatile Tableau platform. Students will learn the ins and outs of the data visualization tool and create dashboards to solve real-world problems in 1-week, accelerated or 10-week, part-time course formats — on campus and online. You can also get a taste in our interactive tableau training with these classes and workshops.

Ask a Question About Our Data Programs

Meet Our Expert

Samanta Dal Pont is a business intelligence and data analytics expert in retail, eCommerce, and online media. With an educational background in software engineer and statistics, her great passion is transforming businesses to make the most of their data. Responsible for the analytics, reporting, and visualization in a global organization, Samanta has been an instructor for Data Analytics courses and SQL bootcamps at General Assembly London since 2016.

Samanta Dal Pont, Data Analytics Instructor, General Assembly London

8 Simple Ways to Turn Your Skills into a Profitable Side Business

By

8 ways side hustle robinson

In the US alone, there are over 28 million small businesses. Of those, an estimated 22 million consist of a single operating member—solopreneurs as I like to call them.

Many of these small business owners started their businesses as nothing more than the intersection of passion and skills that combined to create a business idea with the ability to earn extra money and scale into something truly sustainable.

As someone who’s successfully launched four profitable side businesses over the past four years, I’ve learned a lot about how to turn your skills into a healthy side income. From building physical products to selling my consultative services, and building my own suite of digital products, I’ve been able to generate thousands in extra income each month.

If you’re ready to build a foundation for one day becoming gainfully self-employed, here are my top eight ways to get started with a profitable side business today.

Continue reading

Improving Diversity, Equity, & Inclusion Within Your Organization

By

Systemic racism has been a critical problem for generations, and the Black Lives Matter (BLM) movement has brought centuries of injustice to the spotlight. Over the last six months, following the deaths of Ahmaud Arbery, Breonna Taylor, George Floyd, and so many others, individuals worldwide have taken a stand to fight oppression and discrimination against Black, Indigenous, and People of Color (BIPOC).

It’s an inflamed and sensitive time that calls for radical change. Diverse companies not only outperform their less diverse peers, but they also forge stronger connections with their customers. 77% of U.S. consumers said it was “deeply important that companies respond to racial injustice to earn or keep their trust.” As consumer bases diversify and consumers change their spending habits, companies need to ensure that their content, messaging, product, design, and data align with these shifts. While organizations know they need stronger commitments to diversity, equity, and inclusion (DEI), many don’t know where to start — individual companies often take action but lack coordinated guidance. 

Our Standards Boards were established to increase the clarity of and access to careers in marketing, AI & data science, product management, and UX design. To date, the Boards have primarily focused on providing clarity on the skills needed within specific fields by publishing career frameworks and certifications. Now, it’s time to connect to the access portion of their work. Together, the Standards Boards have crafted DEI principles that guide organizations on how to provide equitable access to skills and career paths for their employees. 

Improving Diversity, Equity, & Inclusion: A Practical Guide

To create a meaningful guide to DEI, our Standards Board Members reflected on what diversity, equity, and inclusion meant as individuals, employees, and leaders of organizations. With this in mind, we focused on improving the current DEI practices each member saw being used and creating a practical playbook that could be applied across companies and disciplines. Ultimately, we hope this playbook serves as a starting point for conversations around DEI that lead to career paths for diverse talent and helps leaders create work environments in which all can succeed. 

Our Standards Board DEI task force drafted a playbook of seven overarching principles that have been refined through feedback from colleagues, DEI experts, GA instructors and staff, plus more. These principles were designed to guide any organization’s DEI strategy, regardless of function, industry, geography, or company size.

These principles were created by leaders in various industries who have a real conviction for driving change. Below, you’ll find a few principles our board members stand behind; they hope you’ll use these to drive conversations and assess how your organization is implementing DEI.

Click to download

Marla Kaplowitz, president and CEO at 4A’s noted: 

“We all recognize a critical need to address systemic issues with diversity, equity, and inclusion through actions — not just words. These principles were created to support action plans for every company to ensure a culture of belonging for all employees, at all levels throughout the organization.” 

We hope these principles spark conversations at your organizations that lead to tactical activities such as revisiting policies, analyzing pay equity, and tracking diversity data. While some of these principles are being implemented across board member organizations, some aren’t. Our intention is to enable organizations to implement DEI policies across every level of an organization through actions, not just words.

The Actions We Are Taking 

It’s essential that these aforementioned principles are put into action. Across the Standards Boards, we’ll be incorporating DEI into career frameworks, assessments, and products. We’ll also be actively recruiting more board members in 2021 to ensure our boards are representative of the talent in their industries.

Within GA, we’re also committed to aligning these principles with our work. We’re actively promoting equity and justice by using our platform to discuss why we should all be angry, and we’re making real commitments to ensure we’re not idle in the face of systemic racism. We’re cultivating conversations about our diversity story and creating a culture of dissent through creating an Inclusion Committee as well as a Fireside Chat series that brings employees and executives together for candid conversations on D&I (both started in 2019). 

We’re cultivating our future employee base by updating our policies to require a diverse slate of interview candidates for all leadership-level positions, revisiting internal promotion criteria, and launching a mentorship program (Code Grow) so our Black, Indigenous, and People of Color (BIPOC) staff has formal avenues to develop their careers. To attract diverse talent, we are utilizing outlier career-search platforms like AngelList, Underdog.io, Vettery, c0ffe3, Black Creatives, and more.

We’re transparent about the areas of difference we’re cultivating by reformalizing our Employee Resource Groups (ERGs) with dedicated executive sponsors. And we’re tying outcomes to actions by measuring all our people metrics and making plans to improve the experiences of underrepresented groups in our organization. We’re also ensuring DEI is central in our product development.

The principles set forward by the Standards Boards are essential to capturing many voices across multiple sectors because they encapsulate what has been learned on our individual and collective journeys. We look to evolving and integrating these principles into GA’s courses, continue the hard work and commitment to DEI at GA, and further develop organizational behaviors, along with the willingness of our Standards Board partners to do the same. 

The list below notes the leaders that have signed on. If you’re a leader who is ready to join us and adopt these principles, you can sign on here.

Participating Leaders:

Shri Bhupathi, Founder and Technical Fellow, MILL5
Gideon Bullock
Andrea Chesleigh
Chad Evans, SVP, Product and Platform, NBA
Stephen Gates
Benjamin Harrell, Chief Marketing Officer, Priceline
Marla Kaplowitz, President and CEO, 4A’s
Willy Lai
Louis Lecat
Kevin Lyons, SVP of Technology, Nielsen
Francisco Martin, Head of Business Development, Thrive Global
Marilyn McDonald, SVP of B2B Experiences, Mastercard
Kristof Neirynck, CMO of Global Brands, Walgreens Boots Alliance
Gretchen O’Hara, VP of AI & Sustainability, Strategy & Partnership, Microsoft
Michelle Onvural, CEO, Bonobos
Seth Rogin, CEO, Magnolia Media Partners
Nick Perugini
Adam Powers
Professor Andrew Stephen, Associate Dean of Research & L’Oréal Professor of Marketing
Linda Tong, General Manager, AppDynamics (a Cisco Company) 
Sang Valte, UX Director, Jellyfish

It’s a new world that calls for moral bravery and clear actions. We welcome all feedback on these principles and look forward to hearing how your organization implements these and other DEI initiatives. 

 

5 High-Paying Careers That Require Data Analysis Skills

By

Data-Driven-UX-Design

The term “big data” is everywhere these days, and with good reason. More products than ever before are connected to the Internet: phones, music players, DVRs, TVs, watches, video cameras…you name it. Almost every new electronic device created today is connected to the Internet in some way for some purpose.

The result of all those things connected to the Internet is data. Big, big data. What’s that mean for you? Simply put, it means if you can quickly, accurately, and intelligently sift through data and find trends, you are extremely valuable in today’s tech job market. More specifically, here are five job titles that require data analytics skills and expertise to get ahead. 

Continue reading

The Difference Between a Startup and a Small Business

By

startup vs. small business image

If you work in the technology industry, or live in a tech hub such as Silicon Valley, Hong Kong, or New York —it’s likely that you or someone you know is in the process of conceptualizing or even launching his or her own startup venture.

A startup venture is often misunderstood for simply a small new business. The truth is, there is a significant difference between a startup and a small business.

Continue reading

UX, Visual, or Graphic: Which Type of Design Is Right for You?

By

UX Design Image
  • CC Image Courtesy of Thomas Brasington on Flickr

You can be pardoned for sometimes feeling confused about all the terminology and job titles floating around in the design world. What is the difference between graphic design, visual design, and user experience design? Do each of the three roles provide a different service? For visual and graphic designers, the difference may lie mainly in the job title and salary expectations. However, a user experience designer has very different end goals and responsibilities from a visual or graphic designer. Below is a breakdown of what each of these designers does within the design industry, to help you decide what type of design is right for you. Continue reading

Computer Science vs. Data Science: What is the Difference?

By

Maybe you want to learn more about data science since you’ve heard it’s “the sexiest job of the 21st century.” Or maybe your software engineer friend is trying to talk you into learning computer science. Either way, both data science and computer science skills are in demand. In this article, we will cover the major differences between data science and computer science to clarify the distinction between these two fields.

Before we dive into the differences, let’s define these two sciences:

Data Science vs. Computer Science

Data science is an interdisciplinary field that uses data to extract insights and inform decisions. It’s often referred to as a combination of statistics, business acumen, and computer science. Data scientists clean, explore, analyze, and model with data using programming languages such as Python and R along with techniques such as statistical models, machine learning, and deep learning.

While it’s one part of data science, computer science is its own broader field of study involving a range of both theoretical and practical topics like data structures and algorithms, hardware and software, and information processing. It has many applications in fields like machine learning, software engineering, and mathematics.

History

While many of the topics used in data science have been around for a while, data science as a field is in its infancy. In 1974, Peter Naur defined the term “data science” in his work, Concise Survey of Computer Methods. However, even Naur couldn’t have predicted the vast amount of data that our modern world would generate on a daily basis only a few decades later. It wasn’t until the early 2000s that data science was recognized as its own field. It gained popularity in the early 2010s, leading to the field as we know it today — a blend of statistics and computer science to drive insights and make data-driven business decisions. “Data science,” “big data,” “artificial intelligence,” “machine learning,” and “deep learning” have all become buzzwords in today’s world. These are all components of data science and while trendy, they can provide practical benefits to companies. Historically, we did not have the storage capacity to hold the amount of data that we are able to collect and store today. This is one reason that data science has become a popular field only recently. The emergence of big data and the advancements in technology have paved the way for individuals and businesses to harness the power of data. While many of the tools that data scientists use have been around for many years, we have not had the software or hardware requirements to make use of these tools until recently.

Computer science, on the other hand, has been a field of study for centuries. This is one of the main differences between it and data science. Ada Lovelace is known for pioneering the field of computer science as the person who wrote the first computer algorithm in the 1840s. However, computing devices such as the abacus date back thousands of years. Computer science is a topic that has been formally researched for much longer than data science, and companies have been using computer science tools for decades. It’s an umbrella field that has numerous subdomains and applications. 

Applications

The applications of each of these fields in the industry differs as well. Computer science skills are used in many different jobs including that of a data scientist. However, common roles involving computer science skills include software engineers, computer engineers, software developers, and web developers. Two roles that use computer science, front end engineer and Java developer, ranked first and second respectively on Glassdoor’s 50 Best Jobs in America for 2020 list. While these roles do not formally require degrees, many people in these jobs hold a degree or come from a background in computer science. 

Common computer science job tasks include writing, testing, and debugging code, developing software, and designing applications. Individuals that use computer science in their roles often create new software and web applications. They need to have excellent problem solving skills and be able to write code in programming languages such as Python, Ruby, JavaScript, Java, or C#. They also need to have a fundamental understanding of how these languages work, and be well-versed in object oriented programming.

Data science is applied in job titles such as data scientist, data analyst, machine learning engineer, and data engineer. Data scientist and data engineer ranked third and sixth respectively on Glassdoor’s 50 Best Jobs in America for 2020. Individuals in these roles come from a variety of backgrounds including computer science, statistics, and mathematics. 

Common data science job tasks include cleaning and exploring data, extracting insights from data, and building and optimizing models. Data scientists analyze and reach conclusions based on data. They need to be well versed in statistics and mathematics topics including linear algebra and calculus as well as programming languages such as Python, R, and SQL. They also need to have excellent communication skills as they are often presenting insights, data visualizations, and recommendations to stakeholders.

Since computer science is one component of data science, there is often crossover in these roles and responsibilities. For example, computer science tasks like programming and debugging are used in both computer science jobs and data science jobs. Both of these fields are highly technical and require knowledge of data structures and algorithms. However, the depth of this knowledge required for computer science vs. data science varies. It’s often said that data scientists know more about statistics than a computer scientist but more about computer science than a statistician. This reinforces the interdisciplinary nature of data science.

The Use of Data

Data, or information such as numbers, text, and images, has applications in both computer science and data science. The study and use of data structures is a topic in computer science. Data structures are ways to organize, manage, and store data in ways that it can be used efficiently; a sub-domain of computer science, it allows us to store and access data in our computer’s memory. Data science benefits from data structures to access data, but the main goal of data science is to analyze and make decisions based on the data, often using statistics and machine learning.

The Future of Computer Science and Data Science

Today, all companies and industries can benefit from both of these fields. Computer scientists drive business value by developing software and tools while data scientists drive business value by answering questions and making decisions based on data. As software continues to integrate with our lives and daily routines, computer science skills will continue to be critical and in demand. As we continue to create and store vast amounts of data on a daily basis, data science skills will also continue to be critical and in demand. Both fields are constantly evolving as technology advances and both computer scientists and data scientists need to stay current with the latest tools, methods, and technologies.

The field of data science would not exist without computer science. Today, the two fields complement each other to further applications of artificial intelligence, machine learning, and personalized recommendations. Many of the luxuries that we have today — a favorite streaming service that recommends new movies, the ability to unlock our phones with facial recognition technology, or virtual home assistants that let us play our favorite music just by speaking — are made possible by computer science and made better by data science. As long as bright, motivated individuals continue to learn data science and computer science, these two fields will continue to advance technology and improve the quality of our lives.

Explore Data Workshops


A Guide to Startup Compensation

By and

If you’re pursuing a job at a startup company, one of the most important factors you’ll need to consider is compensation, which is commonly structured differently than at a mature company. This is largely dependent on the life stage of a company, which can greatly impact compensation, as well as work-life balance, risk, and upside.

Compensation at a startup company is largely made up of three components: salary, benefits, and equity. The value of each depends on the stage of a company’s growth, the role, and an employee’s previous experience. A good rule of thumb, though, is this: The earlier a stage the company is in, the lower the salary and benefits will be, but the higher the equity will be. As the company matures, the scales start to tip in the other direction. Let’s talk in a bit more detail about each of these.

Salary

As mentioned above, salary is largely contingent on the company’s stage, the role, and the employee’s previous experience. There is no one-size-fits-all here. At an earlier-stage company, you can almost certainly expect a lower base salary than the industry norm, regardless of your previous experience. As the company matures, the salaries of all positions start to get closer and closer to market rate. If you’re curious what to expect, we recommend playing with the salaries and equity tool by AngelList or researching salary ranges at specific companies on Glassdoor.

Benefits

Benefits at a startup are also largely dependent on stage. If good benefits are important to you, then an early-stage startup is likely the wrong place to work. However, as a startup grows, its benefits often become an extension of its culture and are used in all recruiting efforts. Take, for instance, Airbnb, which offers a $2,000 travel stipend to all employees. Other startups may allow pets at the office, or offer gym and other discounts, catered lunches, generous vacation policies, or flexible remote-working options.

Equity: Stock and Vesting Schedules

Equity is often the most confusing and intriguing part of a compensation package at a startup. Equity refers to ownership of the company, and this can be extremely valuable if the company ever sells or goes public (learn more about startup fundraising here and in our eBook, How to Get a Job at a Startup).

What’s important to know here is that no employee is ever “given” equity. Instead, employees often receive stock options, which are the option to purchase equity in the company at a heavily discounted price. You also are not given all of your stock options up front; rather, you earn an increasing amount of options over a four-year period. That four-year period is often referred to as a vesting schedule. The typical vesting schedule gives you one-fourth of your options at the end of your first year, and then 1/48th every month after that. Once your options vest, you have the right to purchase them (or not).

Getting into a company early has a big impact on the amount of stock options you receive and at what price. If you join a company early, you are often rewarded with a higher number of options at a much lower price. As the company matures, the risk gets lower and its ability to pay market-rate salaries improve, so you will typically receive fewer stock options and at a higher purchase price.

The benefit of purchasing your options is that eventually — fingers crossed — the company will sell or go public and you will get a big payday. For example, early Instagram employees turned their stock options into an average profit of nearly $8 million! And there’s the famous example of the Facebook muralist who was compensated in stock options that were eventually worth north of $200 million. Of course, these examples are far on the ludicrous side of the scale, and many people don’t make any money from stock options — but risky or not, they’re part of what makes joining a startup so exciting.

How to Negotiate Your Startup Offer

There are special considerations to make when negotiating your compensation at a startup. Macia Batista, a career coach at General Assembly’s New York campus, walks you through essential steps for building your ideal job offer.

  • Know your minimum number. Leverage sites like PayScale and Glassdoor to learn to learn what employers in your city are paying for similar roles and industries. Do your research ahead of time to fully understand the fair market value for the position, taking into account background and experience. Know your worth!
  • Provide a salary range. Determine a range for yourself, then ask for the upper half of it, so you can negotiate down if needed. Giving a range demonstrates flexibility. It gives you the opportunity to ask for more when an offer is presented, and negotiate other variables, like 401k contribution, remote work options, or vacation days. Tell the hiring manager, “I’m targeting roles with a range of X, but I’m focused on the entirety of the package including culture, growth, and mission.”
  • Consider the whole package — not just salary. Compensation goes beyond your paycheck. When weighing a job offer, look at factors like bonuses, equity, health care and retirement plans, transportation costs, schedule flexibility (e.g., working from home and vacation time), and potential for growth at the company.
  • Ensure your pay increases with funding. If you’re joining an early-stage startup, equity (stock options) is oftentimes part of the compensation package, since these offers often fall below market salary. However, you should be be earning a fair market-value salary as soon as the company raises real money. I recommend signing a written agreement with your employer to guarantee a pay increase once the company has more capital.

How to Land Your Dream Startup Job

Working in the startup world can be one of the most challenging, exhilarating, sometimes heartbreaking, and oftentimes fulfilling journeys of your life. But before you find first startup job, there are terms to learn, steps to take, and skills to grow to make you a candidate who stands out from the crowd.

In our eBook, How to Get a Job at a Startup, we’ll help you find your dream startup job through the knowledge of startup job-hunters, founders, and employers. Get firsthand tips on how to break into a startup career, clear up confusing industry jargon, and learn about important resources that will aid you on your journey as a startup employee.

General Assembly believes that everyone should be empowered to pursue work they love. We hope you’ll find this book to be a helpful first step in getting there yourself.

How to Land a Job at a Startup

Learn how to start your journey with our exclusive guide.

Get the eBook

How is Python Used in Data Science?

By

Python is a popular programming language used by both developers and data scientists. But what makes it so popular and why are so many data scientists choosing Python over other programming languages? In this article, we’ll explore the advantages of Python programming and why it’s useful for data science.

What is Python?

No, we’re not talking about the giant, tropical snake. Python is a general-purpose, high-level programming language. It supports object oriented, structured, and functional programming paradigms.

Python was created in the late 1980s by the Dutch programmer Guido van Rossum who wanted a project to fill his time over the holiday break. His goal was to create a programming language that was a descendant of the ABC programming language but would appeal to Unix/C hackers. Van Rossum writes that he chose the name Python for this language, “being in a slightly irreverent mood (and a big fan of Monty Python’s Flying Circus).”

Python went through many updates and iterations and by the year 2008, Python 3.0 was released. This was designed to fix many of the design flaws in the language, with an emphasis on removing redundant features. While this update had some growing pains as it was not backwards compatible, the new updates made way for Python as we know it today. It continues to be well-maintained and supported as a popular, open source programming language.

In “The Zen of Python,” developer Tim Peters summarizes van Rossum’s guiding principles for writing code in Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

These principles touch on some of the advantages of Python in data science. Python is designed to be readable, simple, explicit, and explainable. Even the first principle states that Python code should be beautiful. In general, Python is a great programming language for many tasks and is becoming increasingly popular for developers. But now you may be wondering, why learn Python for data science?

Why Python for Data Science?

The first of many benefits of Python in data science is its simplicity. While some data scientists come from a computer science background or know other programming languages, many come from backgrounds in statistics, mathematics, or other technical fields and may not have as much coding experience when they enter the field of data science. Python syntax is easy to follow and write, which makes it a simple programming language to get started with and learn quickly. 

In addition, there are plenty of free resources available online to learn Python and get help if you get stuck. Python is an open source language, meaning the language is open to the public and freely available. This is beneficial for data scientists looking to learn a new language because there is no up-front cost to start learning Python. This also means that there are a lot of data scientists already using Python, so there is a strong community of both developers and data scientists who use and love Python.

The Python community is large, thriving, and welcoming. Python is the fourth most popular language among all developers based on a 2020 Stack Overflow survey of nearly 65,000 developers. Python is especially popular among data scientists. According to SlashData, there are 8.2 million active Python users with “a whopping 69% of machine learning developers and data scientists now us[ing] Python (compared to 24% of them using R).”4 A large community brings a wealth of available resources to Python users. Not only are there numerous books and tutorials available, there are also conferences such as PyCon where Python users across the world can come together to share knowledge and connect. Python has created a supportive and welcoming community of data scientists willing to share new ideas and help one another. 

If the sheer number of people using Python doesn’t convince you of the importance of Python for data science, maybe the libraries available to make your data science coding easier will. A library in Python is a collection of modules with pre-built code to help with common tasks. They essentially allow us to benefit from and build on top of the work of others. In other languages, some data science tasks would be cumbersome and time consuming to code from scratch. There are countless libraries like NumPy, Pandas, and Matplotlib available in Python to make data cleaning, data analysis, data visualization, and machine learning tasks easier. Some of the most popular libraries include:

  • NumPy: NumPy is a Python library that provides support for many mathematical tasks on large, multidimensional arrays and matrices.
  • Pandas: The Pandas library is one of the most popular and easy-to-use libraries available. It allows for easy manipulation of tabular data for data cleaning and data analysis.
  • Matplotlib: This library provides simple ways to create static or interactive boxplots, scatterplots, line graphs, and bar charts. It’s useful for simplifying your data visualization tasks.
  • Seaborn: Seaborn is another data visualization library built on top of Matplotlib that allows for visually appealing statistical graphs. It allows you to easily visualize beautiful confidence intervals, distributions, and other graphs.
  • Statsmodels: This statistical modeling library builds all of your statistical models and statistical tests including linear regression, generalized linear models, and time series analysis models.
  • Scipy: Scipy is a library used for scientific computing that helps with linear algebra, optimization, and statistical tasks.
  • Requests: This is a useful library for scraping data from websites. It provides a user-friendly and responsive way to configure HTTP requests.

In addition to all of the general data manipulation libraries available in Python, a major advantage of Python in data science is the availability of powerful machine learning libraries. These machine learning libraries make data scientists’ lives easier by providing robust, open source libraries for any machine learning algorithm desired. These libraries offer simplicity without sacrificing performance. You can easily build a powerful and accurate neural network using these frameworks. Some of the most popular machine learning and deep learning libraries in Python include:

  • Scikit-learn: This popular machine learning library is a one-stop-shop for all of your machine learning needs with support for both supervised and unsupervised tasks. Some of the machine learning algorithms available are logistic regression, k-nearest neighbors, support vector machine, random forest, gradient boosting, k-means, DBSCAN, and principal component analysis.
  • Tensorflow: Tensorflow is a high-level library for building neural networks. Since it was mostly written in C++, this library provides us with the simplicity of Python without sacrificing power and performance. However, working with raw Tensorflow is not suited for beginners.
  • Keras: Keras is a popular high-level API that acts as an interface for the Tensorflow library. It’s a tool for building neural networks using a Tensorflow backend that’s extremely user friendly and easy to get started with.
  • Pytorch: Pytorch is another framework for deep learning created by Facebook’s AI research group. It provides more flexibility and speed than Keras, but since it has a low-level API, it is more complex and may be a little bit less beginner friendly than Keras. 

What Other Programming Languages are Used for Data Science?

Python is the most popular programming language for data science. If you’re looking for a new job as a data scientist, you’ll find that Python is also required in most job postings for data science roles. Jeff Hale, a General Assembly data science instructor, scraped job postings from popular job posting sites to see what was required for jobs with the title of “Data Scientist.” Hale found that Python appears in nearly 75% of all job postings. Python libraries including Tensorflow, Scikit-learn, Pandas, Keras, Pytorch, and Numpy also appear in many data science job postings.

Image source: The Most In-Demand Tech Skills for Data Scientists by Jeff Hale

R, another popular programming language for data science, appeared in roughly 55% of the job postings. While R is a useful tool for data science and has many benefits including data cleaning, data visualization, and statistical analysis, Python continues to become more popular and preferred among data scientists for a majority of tasks. In fact, the average percentage of job postings requiring R dropped by about 7% between 2018 and 2019, while Python increased in the percentage of job postings requiring the language. This isn’t to say that learning R is a waste of time; data scientists that know both of these languages can benefit from the strengths of both languages for different purposes. However, since Python is becoming increasingly popular, there’s a high chance that your team uses Python, and it’s important to use the language that your team is comfortable with and prefers.

What is the Future of Python for Data Science?

As Python continues to grow in popularity and as the number of data scientists continues to increase, the use of Python for data science will inevitably continue to grow. As we advance machine learning, deep learning, and other data science tasks, we’ll likely see these advancements available for our use as libraries in Python. Python has been well-maintained and continuously growing in popularity for years, and many of the top companies use Python today. With its continued popularity and growing support, Python will be used in the industry for years to come.

Whether you’ve been a data scientist for years or you are just beginning your data science journey, you can benefit from learning Python for data science. The simplicity, readability, support, community, and popularity of the language — as well as the libraries available for data cleaning, visualization, and machine learning — all set Python apart from other programming languages. If you aren’t already using Python for your work, give it a try and see how it can simplify your data science workflow.

Explore Data Workshops

Understanding the Difference Between Data Analytics and Data Science

By

Data analytics and data science are two key terms thrown around in the tech and business world. What do they mean, and what’s the difference between the two? Data analytics is concerned with performing statistical analysis on existing datasets to solve problems and find answers to current issues we don’t know the answers to. Data science focuses on creating actionable insights and predictions from raw and structured data, often in large quantities.

This article will discuss the critical differences between data analytics and data science. First, we’ll explain what big data is, followed by a little more information on each role: data analyst and data scientist.

What is Big Data?

Big data can often be challenging to comprehend. Big data is usually more extensive and more complex than other datasets and may contain multiple sources. Put simply, big data is too large to process and understand using traditional data processing methods. This is where data analysts and data scientists come in — their job is to interpret this data and present it to their company or organization.

The original definition of big data, prefaced by Gartner (2001) is as follows: “Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”

The Three V’s of Big Data

To better understand big data, whether as a data analyst, scientist, or curious individual, we must apply the three V’s: Volume, Velocity, and Variety.

Volume

When it comes to an understanding of big data, the volume of this data matters significantly. Big data requires you to process an increased volume of unstructured data, e.g. Twitter feeds, sensor-enabled equipment, forum responses, or comments and reviews on webpages or mobile apps. This data can be difficult to comprehend; however, it’s crucial that there’s a lot of it in order to make valid claims. The volume of big data depends on the organization’s size and the questions that are being asked.

Velocity

In regards to big data, velocity is the speed at which data is received and then interpreted. Some pieces of software can do this automatically, depending on the complexity and structure of the data. However, this is not always possible, making the velocity much slower as it’s done manually by a data analyst or data scientist.

Variety

Finally, we have a variety. This refers to the different types of data that are available, both structured and unstructured. For example, data types may include audio, text, video, comments on forums, reviews, and other metadata. In the last few years, we’ve seen a rise in unstructured data (such as interviews, which then need to be transcribed), audio recordings, and video interviews.

Value & Veracity

Although the three V’s mentioned above are the go-to for big data, more recently, two new V’s have been introduced: value and veracity. For example, all data contains an intrinsic value, but this value cannot be understood until the data is understandable. Some data contains more intrinsic value than others, and this is determined by the data source and the truthfulness of the data, e.g. can you rely on the data source?

Big data is becoming more and more mainstream, especially for large tech companies (and others that deal in large quantities of data) to better understand their users and their products. For instance, companies such as Apple use big data to understand and map user experience and intentions, and to help create new products that customers will actually be interested in — solutions to problems that others don’t yet recognize as obstacles.

Data Analytics vs. Data Science

As mentioned previously, both data analytics and data science are somewhat similar and often confused. To eliminate this confusion and to better help you understand the difference, we’ve provided a brief description of each role below.

What does a data analyst do?

A data analyst’s job consists of sorting through data to provide visual and written reports to uncover insights in a dataset. These datasets could be on any topic, whether a crime, government funding, or within the sports performance industry. Often, many data scientists practice first as a data analyst, learning the ropes and better understanding data as a whole.

What does a data scientist do?

A data scientist’s role is to collect and analyze data to gather valuable insights, later sharing these with their organization or company. Similar to a data analyst, the role of a data scientist exists across many different industries.

Unlike data analysts who provide insights via representations of data, data scientists are more significantly involved by creating their own experiments, cleaning data, finding patterns, building algorithms, and finally, sharing their data and newly found insights with their team in an easy to understand process.

What is the difference between data analytics and data science?

This next section will explain several key differences between data analytics and data science to help you better understand each role in more detail.

1.   Data science is multidisciplinary

One of the main differences between data analytics and data science is that data science incorporates numerous disciplines, including data analytics, data engineering, machine learning, and software engineering, to name a few. In particular, data science relies heavily on machine learning and data analytics. Without traditional data analytics, whether performed by an analyst or a data scientist, it would be difficult and nearly impossible to understand big data.

Ultimately, a data scientist’s role is to understand and re-structure big data, identify patterns, and educate business leaders and decision-makers on their findings to adjust current practices for better, more effective results.

2.   The unknown vs. the known

A data scientist’s role is to predict future events or further data by analyzing past data patterns. On the other hand, a data analyst looks at current data and perspectives to better understand current events. This fundamental difference is paramount, and a critical distinction between the two sets of expertise. Essentially, data scientists focus on the future, and data analysts center their attention on the now.

3.   Hands-on machine learning experience

Data analysts are not expected or required to have hands-on machine learning experience. Similarly, those within this role are not likely to build statistical models or conduct advanced experiments to better understand big data.

Data scientists, on the other hand, are expected to have hands-on machine learning experience and are required to build their own statistical models and conduct their own experiments. As you can see, the roles are somewhat similar, but a data scientist’s role is more advanced and a step up from a data analyst. This is why many data scientists start out as data analysts.

4.   Addressing vs. formulating questions

Generally, data analysts are given questions to address by their business or organization. The request usually has to do with understanding a specific dataset to better benefit the business and their regular operations, e.g. cutting costs, increasing footfall, or understanding sales trends of distinct products or services.

Conversely, data scientists formulate these questions and provide solutions that will benefit the business. Usually, these questions are about events that haven’t happened yet; with greater focus on predicting the future as opposed to understanding current data and events.

5.   Multiple sources vs. single sources

Data analysts typically use and interpret data from a single source, such as a CRM system, while data scientists collect and gain insights from multiple data sources — sources that are often disconnected and more complex to understand. This is why processes such as machine learning and statistical models are used to better understand this big data.

6.   Visualization skills

Data analysts are not always required to possess business acumen or exceptional data visualization skills. Instead, their role is to interpret the data in an easy-to-understand fashion, not to implement changes to a business setting or real-world scenario. By comparison, data scientists are required to show business acumen and advanced data visualization skills, putting newly understood data to work in a business setting and contextualizing potential impacts on a business and its current decisions and processes.

Frequently Asked Questions

Can a data analyst become a data scientist?

Yes, data analysts can become data scientists. Many data scientists often start as data analysts, learning the big data world’s ropes and the various methods involved in interpreting and making sense of data. With this being said, an advanced degree is not necessary but may support you during the transition process.

Which is better for business: analytics or data science?

Business analytics is concerned with the analysis of data to make key business decisions, while data science uses statistics and various other methods to complement and inform business decisions. While there’s no correct answer, if you think you’d like to be more involved in a business decision, then a business analyst role is probably for you.

Data analyst vs. data scientist salary — which is better?

According to Glassdoor, the average salary for a data analyst ranges from $83,000 to $115,000, while data scientists earn, on average, upwards of $168,000 a year.

To conclude

Data analytics and data science have different roles within the same industry; however, they’re somewhat similar. As we’ve discussed, data analysts focus on sorting through current datasets to provide insights and visualizations in response to a business or organization’s question or current problem. On the other hand, data scientists formulate their questions as well as the subsequent answers and solutions that will benefit the business, focusing typically on events that have not yet happened.

Many data scientists often become data analysts first, helping them to better understand big data and the many processes involved in its analysis. Think of a data scientist as a more advanced data analyst — they ask questions, use machine-learning, build statistical models, and conduct experiments. However, both roles share the critical goal of a better understanding of big data.

Explore Data Workshops