data analytics Tag Archives - General Assembly Blog

Why Should You Become a Data Scientist?

By

Data is everywhere

The amount of data captured and recorded in 2020 is approximately 50 zettabytes, i.e., 50 followed by 21 zeros(!) and it’s constantly growing. Other than data captured from social media platforms, as individuals, we are constantly using devices that measure our health by tracking the number of footsteps, heart rate, sleep, and other physiological signals more regularly. Data analytics has helped greatly to discover patterns in our day-to-day activities and gently nudge us towards better health via everyday exercise and improving our quality of sleep. Just like how we track our health, internet sensors are used on everyday devices such as refrigerators, washing machines, internet routers, lights etc., to not only operate them remotely but also to monitor their functional health and provide analytics that help with troubleshooting in case of failure. 

Organizations are capturing data to better understand their products and help their consumers. Industrial plants today are installed with a variety of sensors (accelerometers, thermistors, pressure gauges) that constantly monitor high-valued equipment in order to track their performance and better predict downtime.  As internet users, we’ve experienced the convenience that results from capturing our browsing data — better search results on search engines, personalized recommendation on ecommerce websites, structured and organized inboxes, etc. Each of these features is an outcome of data science techniques of information retrieval and machine learning applied on big data. 

On the enterprise side, digital transformation such as digital payments and ubiquitous use of software and apps has propelled data generation. With a smart computer in every palm and a plethora of sensors both on commercial and industrial scale, the amount of data generated and captured will continue to explode. This constant generation of data drives new and innovative possibilities for organizations and their consumers through approaches and toolsets rooted in data science. 

Data science drives new possibilities

Data science is the study of data aimed towards making informed decisions.

On the one hand, monitoring health data and data analytics is guiding individuals to make better decisions towards their health goals. On the other hand, aggregation of health data at the community level in a convenient and accessible way sets the stage to conduct interdisciplinary research towards answering questions like, Does the amount of physical activity relate to our heart health? Can changes in heart rate over a period of time help predict heart disorders? Is weight loss connected with the quality of our sleep? In the past it was unimaginable to support such research with significant data points. However, today, a decade worth of such big data enables us to drive research on the parameters connected to different aspects of our health. It’s significant that this research is not restricted to laboratories and academic institutions but are instead driven by collaborative efforts between industry and academia.

Due to the infusion of such data, many traditional industries like insurance are getting disrupted. Previously, insurance premiums were calculated based on age and a single medical test that was performed at sign up. Now, there are efforts taken by life insurance providers to lower premiums through regular monitoring of their customers fitness trackers. With access to this big data, insurance providers are trying to understand and quantify health risks. The research efforts described above would drive quantifiable ways to measure overall health risk by fusing a variety of health metrics. All these new products will heavily rely on the use of advanced analytics that uses artificial intelligence and machine learning (AI/ML) techniques to develop models that predict personalized premiums. In order to drive these new possibilities for insights, the application of data science toolsets approaches goes through a rigorous process.

Data science is an interdisciplinary process

A data science process typically starts up with a business problem. Data required to solve the problem can come from multiple sources. Social media data such as text and images from social media platforms like Facebook and Instagram would be compartmentalized from enterprise data such as customer info and their transactions. However, depending on the problem to be solved, all relevant data are collected and can be fused across social media and enterprise domains to gain unique insights to solve the business problem.

A data science generalist works on different data formats and systematically analyses the data to extract insights from it. Data science can be subdivided into several specialized areas based on data format used to extract insights: (1) computer vision, i.e., field of study of image data, (2) natural language processing, i.e. analysis of textual data, (3) time-series processing, i.e. analysis of data varying in time such as stock market, sensor data, etc. 

A data scientist specialist is capable of applying advanced machine learning techniques, to convert unstructured data to structured format by extracting the relevant attributes of an entity from unstructured data with great accuracy. No other area has seen the impact of the data science generalist or the specialist more than in the product development lifecycle, across a gamut of organizations of all sizes.

Data scientist as a unifier in the product development lifecycle

The role of a data scientist spans across multiple stages of the product development process. Typically, a product development goes through the stages of envisioning, choosing different features to build and finally, designing those specific features. A data scientist is a unifier across all of these stages in the modern world. Even during the envisioning part, data analysis on the marketing data enables the decision on what features need to be built in terms of the need from the maximal number of customers and from a competitive standpoint. 

Once the feature list has been decided, the next step is designing those specific features. Typically, such design activities have been in the realm of designers and to a lesser extent developers. Traditionally, the designer designs features and then makes a judgment call based on user experience studies with a small sample size. However, what might be a good design for 10 users might not be a good design for 90 other users. In such situations, the designers’ judgment cannot necessarily address the entire user base. 

Organizations run different experiments to gather systematic data to audit the progress of the product. With data science toolsets, deriving the ground truth no longer needs to be constrained by such traditional design approaches. Based on the nature of the feature design, data from A/B experiment testing can provide input to both developers and designers alike on design options and product decisions that are optimal for the user base. 

Data science is the future

The spectrum of the data scientist’s role and contribution is vast. On one end, the data scientist can drive new possibilities through data-backed insights in areas like healthcare, suggest personalization options for users based on their needs, etc. On the other end, the data scientist can drive a cost-based discussion on which feature to design or what optimal option to choose. Data scientists are now the voices of customers throughout the product development process, and the unifiers through an interdisciplinary approach.

Just like making a presentation, editing documents and composing emails have become ubiquitous skills today, data science skills will pervasively be used across different functional roles to make business decisions. With the explosion in the amount of data, the demand for data scientists, data analysts, and big data engineers in the job market will only rise. Organizations are constantly looking for data professionals who can convert data into insights to make better decisions. A career in data science is simulating — the dynamic and ever-evolving nature of the field tied closely with current research keeps one young!

Explore Data Workshops

What is Data Science?

By

It’s been anointed “the sexiest job of the 21st century”, companies are rushing to invest billions of dollars into it, and it’s going to change the world — but what do people mean when they mention “data science”? There’s been a lot of hype about data science and deservedly so, but the excitement has helped obfuscate the fundamental identity of the field. Anyone looking to involve themselves in data science needs to understand what it actually is and is not.

In this article, we’ll lay out a deep definition of the field, complete descriptions of the data science workflow, and data science tasks used in the real world. We hope that any would-be entrants into this line of work will come away reading this article with a nuanced understanding of data science that can help them decide to enter and navigate this exciting line of work.

So What Actually is Data Science?

A quick definition of data science might be articulated as an interdisciplinary field that primarily uses statistics and computer programming to derive insights from and base decisions from a collection of information represented as numerical figures. The “science” part in data science is quite apt because data science very much follows a scientific process that involves formulating a hypothesis and using a specific toolset to confirm or dispel that hypothesis. At the end of the day, data science is about turning a problem into a question and a question into an answer and/or solution.

Tackling the meaning of data science also means interrogating the meaning of data. Data can be easily described as “information encoded as numbers” but that doesn’t tell us why it’s important. The value of data stems from the notion that data is a tangible manifestation of the intangible. Data provides solid support to aid our interpretations of the world. For example, a weather app can tell you it’s cold outside but telling you that the temperature is 38 degrees fahrenheit provides you with a stronger and specific understanding of the weather.

Data comes in two forms: qualitative and quantitative.

Qualitative data is categorical data that does not naturally come in the form of numbers, such as demographic labels that you can select on a census form to indicate gender, state, and ethnicity.

Quantitative data is numerical data that can be processed through mathematical functions; for example stock prices, sports stats, and biometric information.

Quantitative can be subdivided into smaller categories such as ordinal, discrete, and continuous.

Ordinal: A sort of qualitative and quantitative hybrid variable in which the values have a hierarchical ranking. Any sort of star rating system of reviews is a perfect example of this; we know that a four-star review is greater than a three-star review, but can’t say for sure that a four- star review is twice as good as a two-star review.

Discrete: These are countable and finite values that often appear in the form of integers. Examples include number of franchises owned by a company and number of votes cast in an election. It’s important to remember discrete variables have a finite range of numbers and can never be negative.

Continuous: Unlike discrete variables, continuous can appear in decimal form and have an infinite range of possibilities. Things like company profit, temperature, and weight can all be described as continuous. 

What Does Data Science Look Like?

Now that we’ve established a base understanding of data science, it’s time to delve into what data science actually looks like. To answer this question, we need to go over the data science workflow, which encapsulates what a data science project looks like from start to finish. We’ll touch on typical questions at the heart of data science projects and then examine an example data science workflow to see how data science was used to achieve success.

The Data Science Checklist

A good data science project is one that satisfies the following criteria:

Specificity: Derive a hypothesis and/or question that’s specific and to the point. Having a vague approach can often lead to a waste of time with no end product.

Attainability: Can your questions be answered? Do you have access to the required data? It’s easy to come up with an interesting question but if it can’t be answered then it has no value. The same goes for data, which is only useful if you can get your hands on it.

Measurability: Can what you’re applying data science to be quantified? Can the problem you’re addressing be represented in numerical form? Are there quantifiable benchmarks for success? 

As previously mentioned, a core aspect of data science is the process of deriving a question, especially one that is specific and achievable. Typical data science questions ask things like, does X predict Y and what are the distinct groups in our data? To get a sense of data science questions, let’s take a look at some business-world-appropriate ones:

  • What is the likelihood that a customer will buy this product?
  • Did we observe an increase in sales after implementing a new policy?
  • Is this a good or bad review?
  • How much demand will there be for my service tomorrow?
  • Is this the cheapest way to deliver our goods?
  • Is there a better way to segment our marketing strategies?
  • What groups of products are customers purchasing together?
  • Can we automate this simple yes/no decision?

All eight of these questions are excellent examples of how businesses use data science to advance themselves. Each question addresses a problem or issue in a way that can be answered using data science.

The Data Science Workflow

Once we’ve established our hypothesis and questions, we can now move onto what I like to call the data science workflow, a step-by-step description of a typical data science project process.

After asking a question, the next steps are:

  1. Get and Understand the Data. We obviously need to acquire data for our project, but sometimes that can be more difficult than expected if you need to scrape for it or if privacy issues are involved. Make sure you understand how the data was sampled and the population it represents. This will be crucial in the interpretation of your results.
  1. Data Cleaning and Exploration. The dirty secret of data science is that data is often quite dirty so you can expect to do significant cleaning which often involves constructing your variables in a way that makes your project doable. Get to know your data through exploratory data analysis. Establish a base understanding of the patterns in your dataset through charts and graphs.
  1. Modeling. This represents the main course of the data science process; it’s where you get to use the fancy powerful tools. In this part, you build a model that can help you answer a question such as can we predict future sales of a product from your dataset.
  1. Presentation. Now it’s time to present the results of your findings. Did you confirm or dispel your hypothesis? What are the answers to the questions you started off with? How do your results advance our understanding of the issue at hand? Articulate your project in a clear and concise manner that makes it digestible for your audience, which could be another team in your company or your company’s executives.

Data Science Workflow Example: Predicting Neonatal Infection

Now let’s parse out an example of how data science can affect meaningful real-world impact, taken from the book Big Data: A Revolution That Will Transform How We Live, Work, and Think.

We start with a problem: Children born prematurely are at high risk of developing infections, many of which are not detected until after a child is sick.

Then we turn that problem into a question: Can we detect patterns in the data that accurately predict infection before it occurs?

Next, we gather relevant data: variables such as heart rate, respiration rate, blood pressure, and more.

Then we decide on the appropriate tool: a machine learning model that uses past data to predict future outcomes.

Finally, what impact do our methods have? The model is able to predict the onset of infection before symptoms appear, thus allowing doctors to administer treatment earlier in the infection process and increasing the chances of survival for patients.

This is a fantastic example of data science in action because every step in the process has a clear and easily understandable function towards a beneficial outcome.

Data Science Tasks

Data scientists are basically Swiss Army knives, in that they possess a wide range of abilities — it’s why they’re so valuable. Let’s go over the specific tasks that data scientists typically perform on the job.

Data acquisition: For data scientists, this usually involves querying databases set up by their companies to provide easy access to reams of data. Data scientists frequently write SQL queries to retrieve data. Outside of querying databases, data scientists can use APIs or web scraping to acquire data.

Data cleaning: We touched on this before, but it can’t be emphasized enough that data cleaning will take up the vast majority of your time. Cleaning oftens means dealing with null values, dropping irrelevant variables, and feature engineering which means transforming data in a way so that it can be processed by a model.

Data visualization: Crafting and presenting visually appealing and understandable charts is a hugely valuable skill. Visualization has an uncanny ability to communicate important bits of information from a mass of data. Good data scientists will use data visualization to help themselves and their audiences better understand what’s going on.

Statistical analysis: Statistical tests are used to confirm and/or dispel a data scientist’s hypothesis. A t-test or chi-square are used to evaluate the existence of certain relationships. A/B testing is a popular use case of statistical analysis; if a team wants to know which of two website designs leads to more clicks, then an A/B test is the right solution.

Machine learning: This is where data scientists use models that make predictions based on past observations. If a bank wants to know which customers are likely to pay back loans, then they can use a machine learning model trained on past loans to answer that question.

Computer science: Data scientists need adequate computer programming skills because many of the tasks they undertake involve writing code. In addition, some data science roles require data scientists to function as software engineers because data scientists have to implement their methodologies into their company’s backend servers.

Communication: You can be a math and computer whiz, but if you can’t explain your work to a novice audience, your talents might as well be useless. A great data scientist can distill digestible insights from complex analyses for a non-technical audience, translating how a p-value or correlation score is relevant to a part of the company’s business. If your company is going to make a potentially costly or lucrative decision based on your data science work, then it’s incumbent on you to make sure they understand your process and results as much as possible.

Conclusion

We hope this article helped to demystify this exciting and increasingly important line of work. It’s pertinent to anyone who’s curious about data science — whether it’s a college student or an executive thinking about hiring a data science team — that they understand what this field is about and what it can and cannot do.

Explore Data Workshops

Designing a Dashboard in Tableau for Business Intelligence

By

Tableau is a data visualization platform that focuses on business intelligence. It has become very popular in recent years because of its flexibility and beautiful visualizations. Clients love the way Tableau presents data and how easy it makes performing analyses. It is one of my favorite analytical tools to work with.

A simple way to define a Tableau dashboard is as a glance view of a company’s key performance indicators, or KPIs. There are different kinds of dashboards available — it all depends on the business questions being asked and the end user. Is this for an operational team (like one at a distribution center) that needs to see the amount of orders by hour and if sales goals are achieved? Or is this for a CEO who would like to measure the productivity of different departments and products against forecast? The first case will require the data to be updated every 10 minutes, almost in real time. The second doesn’t require the same cadence, and once a day will be enough to track the company performance.

Over the past few years, I’ve built many dashboards for different types of users, including department heads, business analysts, and directors, and helped many mid-level managers with data analysis. Here are some best practices for creating Tableau dashboards I’ve learned throughout my career.

First Things First: Why Use a Data Visualization?

A data visualizations tool is one of the the most effective ways to analyze data from any business process (sales, returns, purchase orders, warehouse operation, customer shopping behavior, etc.).

Below we have a grid report and bar chart that contain the same information. Which is easier to interpret?

Grid report

Bar Chart
Grid report vs. bar chart.

That’s right — it’s quicker to identify the category with the lowest sales, Tops, using the chart.

Many companies used to use grid reports to operate and make decisions, and many departments still do today, especially in retail. I once went to a trading meeting on a Monday morning where team members printed pages of Excel reports with rows and rows of sales and stock data by product and took them to a meeting room with a ruler and a highlighter to analyze sales trends. Some of these reports took at least two hours to prepare and required combining data from different data sources with VLOOKUPs — a function that allows users to search through columns in Excel. After the meeting, they threw the papers away (what a waste of paper and ink!) and then the following Monday it all started again.

Wouldn’t it be better to have an effective dashboard and reporting tool in which the company’s KPIs were updated on a daily basis and presented in an interactive dashboard that could be viewed on tablets/laptops and digitally sliced and diced? That’s where tools like Tableau dashboards come in. You can drill down into details and answer questions raised in the meeting in real time — something you couldn’t do with paper copies.

How to Design a Dashboard in Tableau

Step 1: Identify who will use the dashboard and with what frequency.

Tableau dashboards can be used for many different purposes, such as measuring different KPI’s, and therefore will be designed differently for each circumstance. This means that, before you can begin designing a dashboard, you need to know who is going to use it and how often.

Step 2: Define your topic.

The stakeholder (i.e., director, sales manager, CEO, business analyst, buyer) should be able to tell you what kind of business questions need to be answered and the decisions that will be made based on the dashboard.

Here, I am going to use data from a fictional retail company to report on monthly sales.

The commercial director would like to know 1) the countries to which the company’s products have been shipped, 2) which categories are performing well, and 3) sales by product. The option of browsing products is a plus, so the tableau dashboard should include as much detail as possible.

Step 3: Initially, make sure you have all of the necessary data available to answer the questions specified.

Clarify how often you will get the data, the format in which you will receive the data (inside a database or in loose files), the cleanliness of the data, and if there are any data quality issues. You need to evaluate all of this before you promise a delivery date.

Step 4: Create your dashboard.

When it comes to dashboard design, it’s best practice to present data from top to bottom. The story should go from left to right, like a comic book, where you start at the top left and finish at the bottom right.

Let’s start by adding the data set to Tableau. For this demo, the data is contained in an Excel file generated by a software I developed myself. It’s all dummy data.

To connect to an Excel file from Tableau, select “Excel” from the Connect menu. The tables are on separate Excel sheets, so we’re going to use Tableau to join them, as shown in the image below. Once the tables are joined, go to the bottom and select Sheet 1 to create your first visualization.

Excel Sheet in Tableau
Joining Excel sheet in Tableau.

We have two columns in the Order Details table: Quantity and Unit Price. The sales amount is Quantity x Unit Price, so we’re going to create the new metric, “Sales Amount”. Right-click on the measures and select Create > Calculated Field.

Creating a Map in Tableau

We can use maps to visualize data with a geographical component and compare values across geographical regions. To answer our first question — “Which countries the company’s products have been shipped to?” — we’ll create a map view of sales by country.

1. Add Ship Country to the rows and Sales Amount to the columns.

2. Change the view to a map.

Map
Visualizing data across geographical regions.

3. Add Sales Amount to the color pane. Darker colors mean higher sales amounts aggregated by country.

4. You can choose to make the size of the bubbles proportional to the Sales Amount. To do this, drag the Sales Amount measure to the Size area.

5. Finally, rename the sheet “Sales by Country”.

Creating a Bar Chart in Tableau

Now, let’s visualize the second request, “Which categories are performing well?” We’ll need to create a second sheet. The best way to analyze this data is with bar charts, as they are to compare data across categories. Pie charts work in a similar way, but in this case we have too many categories (more than four) so they wouldn’t be effective.

1. To create a bar chart, add Category Name to the rows and Sales Amount to the columns.

2. Change the visualization to a bar chart.

3. Switch columns and rows, sort it by descending order, and show the values so users can see the exact value that the size of the rectangle represents.

4. Drag the category name to “Color”.

5. Now, rename the sheet to “Sales by Category”.

Sales category bar chart
Our Sales by Category breakdown.

Assembling a Dashboard in Tableau

Finally, the commercial director would like to see the details of the products sold by each category.

Our last page will be the product detail page. Add Product Name and Image to the rows and Sales Amount to the columns. Rename the sheet as “Products”.

We are now ready to create our first dashboard! Rearrange the chart on the dashboard so that it appears similar to the example below. To display the images, drag the Web Page object next to the Products grid.

Dashboard Assembly
Assembling our dashboard.

Additional Actions in Tableau

Now, we’re going to add some actions on the dashboard such that, when we click on a country, we’ll see both the categories of products and a list of individual products sold.

1. Go to Dashboard > Actions.

2. Add Action > Filter.

3. Our “Sales by Country” chart is going to filter Sales by Category and Products.

4. Add a second action. Sales by Category will filter Products.

5. Add a third action, this time selecting URL.

6. Select Products, <Image> on URL, and click on the Test Link to test the image’s URL.

What we have now is an interactive dashboard with a worldwide sales view. To analyze a specific country, we click on the corresponding bubble on the map and Sales by Category will be filtered to what was sold in that country.

When we select a category, we can see the list of products sold for that category. And, when we hover on a product, we can see an image of it.

In just a few steps, we have created a simple dashboard from which any head of department would benefit.

Dashboard
The final product.

Dashboards in Tableau at General Assembly

In GA’s Data Analytics course, students get hands-on training with the versatile Tableau platform. Students will learn the ins and outs of the data visualization tool and create dashboards to solve real-world problems in 1-week, accelerated or 10-week, part-time course formats — on campus and online. You can also get a taste in our interactive tableau training with these classes and workshops.

Ask a Question About Our Data Programs

Meet Our Expert

Samanta Dal Pont is a business intelligence and data analytics expert in retail, eCommerce, and online media. With an educational background in software engineer and statistics, her great passion is transforming businesses to make the most of their data. Responsible for the analytics, reporting, and visualization in a global organization, Samanta has been an instructor for Data Analytics courses and SQL bootcamps at General Assembly London since 2016.

Samanta Dal Pont, Data Analytics Instructor, General Assembly London

5 High-Paying Careers That Require Data Analysis Skills

By

Data-Driven-UX-Design

The term “big data” is everywhere these days, and with good reason. More products than ever before are connected to the Internet: phones, music players, DVRs, TVs, watches, video cameras…you name it. Almost every new electronic device created today is connected to the Internet in some way for some purpose.

The result of all those things connected to the Internet is data. Big, big data. What’s that mean for you? Simply put, it means if you can quickly, accurately, and intelligently sift through data and find trends, you are extremely valuable in today’s tech job market. More specifically, here are five job titles that require data analytics skills and expertise to get ahead. 

Continue reading

SQL: Using Data to Boost Business and Increase Efficiency

By

In today’s digital age, we’re constantly bombarded with information about new apps, transformative technologies, and the latest and greatest artificial intelligence system. While these technologies may serve very different purposes in our life, all of them share one thing in common: They rely on data. More specifically, they all use databases to capture, store, retrieve, and aggregate data. This begs the question: How do we actually interact with databases to accomplish all of this? The answer: We use Structured Query Language, or SQL (pronounced “sequel” or “ess-que-el”).

Put simply, SQL is the language of data — it’s a programming language that enables us to efficiently create, alter, request, and aggregate data from those mysterious things called databases. It gives us the ability to make connections between different pieces of information, even when we’re dealing with huge data sets. Modern applications are able to use SQL to deliver really valuable pieces of information that would otherwise be difficult for humans to keep track of independently. In fact, pretty much every app that stores any sort of information uses a database. This ubiquity means that developers use SQL to log, record, alter, and present data within the application, while analysts use SQL to interrogate that same data set in order to find deeper insights.

Finding SQL in Everyday Life

Think about the last time you looked up the name of a movie on IMDB. I’ll bet you quickly noticed an actress on the cast list and thought something like, “I didn’t realize she was in that,” then clicked a link to read her bio. As you were navigating through that app, SQL was responsible for returning the information you “requested” each time you clicked a link. This sort of capability is something we’ve come to take for granted these days.

Let’s look at another example that truly is cutting-edge, this time at the intersection of local government and small business. Many metropolitan cities are supporting open data initiatives in which public data is made easily accessible through access to the databases that store this information. As an example, let’s look at Los Angeles building permit data, business listings, and census data.

Imagine you work at a real estate investment firm and are trying to find the next up-and-coming neighborhood. You could use SQL to combine the permit, business, and census data in order to identify areas that are undergoing a lot of construction, have high populations, and contain a relatively low number of businesses. This might be a great opportunity to purchase property in a soon-to-be thriving neighborhood! For the first time in history, it’s easy for a small business to leverage quantitative data from the government in order to make a highly informed business decision.

Leveraging SQL to Boost Your Business and Career

There are many ways to harness SQL’s power to supercharge your business and career, in marketing and sales roles, and beyond. Here are just a few:

  • Increase sales: A sales manager could use SQL to compare the performance of various lead-generation programs and double down on those that are working.
  • Track ads: A marketing manager responsible for understanding the efficacy of an ad campaign could use SQL to compare the increase in sales before and after running the ad.
  • Streamline processes: A business manager could use SQL to compare the resources used by various departments in order to determine which are operating efficiently.

SQL at General Assembly

At General Assembly, we know businesses are striving to transform their data from raw facts into actionable insights. The primary goal of our data analytics curriculum, from workshops to full-time courses, is to empower people to access this data in order to answer their own business questions in ways that were never possible before.

To accomplish this, we give students the opportunity to use SQL to explore real-world data such as Firefox usage statistics, Iowa liquor sales, or Zillow’s real estate prices. Our full-time Data Science Immersive and part-time Data Analytics courses help students build the analytical skills needed to turn the results of those queries into clear and effective business recommendations. On a more introductory level, after just a couple of hours of in one of our SQL workshops, students are able to query multiple data sets with millions of rows.

Ask a Question About Our Data Programs

Meet Our Expert

Michael Larner is a passionate leader in the analytics space who specializes in using techniques like predictive modeling and machine learning to deliver data-driven impact. A Los Angeles native, he has spent the last decade consulting with hundreds of clients, including 50-plus Fortune 500 companies, to answer some of their most challenging business questions. Additionally, Michael empowers others to become successful analysts by leading trainings and workshops for corporate clients and universities, including General Assembly’s part-time Data Analytics course and SQL/Excel workshops in Los Angeles.

“In today’s fast-paced, technology-driven world, data has never been more accessible. That makes it the perfect time — and incredibly important — to be a great data analyst.”

– Michael Larner, Data Analytics Instructor, General Assembly Los Angeles

Excel: Building the Foundation for Understanding Data Analytics

By

If learning data analytics is like trying to ride a bike, then learning Excel is like having a good set of training wheels. Although some people may want to jump right ahead without them, they’ll end up with fewer bruises and a smoother journey if they begin practicing with them on. Indeed, Excel provides an excellent foundation for understanding data analytics.

What exactly is data analytics? It’s more than just simply “crunching numbers,” for one. Data analytics is the art of analyzing and communicating insights from data in order to influence decision-making.

In the age of increasingly sophisticated analytical tools like Python and R, some seasoned analytics professionals may scoff at Excel, which was first released by Microsoft in 1987, as nothing more than petty spreadsheet software. Unfortunately, most people only touch the tip of the iceberg when it comes to fully leveraging this ubiquitous program’s power as a stepping stone into analytics.

Using Excel for Data Analysis: Management, Cleaning, Aggregation, and More

I refer to Excel as the gateway into analytics. Once you’ve learned the platform inside and out, throughout your data analytics journey you’ll continually say to yourself, “I used to do this in Excel. How do I do it in X or Y?” In today’s digital age, it may seem like there are new analytical tools and software packages coming out every day. As a result, many roles in data analytics today require an understanding of how to leverage and continuously learn multiple tools and packages across various platforms. Thankfully, learning Excel and its fundamentals will provide a strong bedrock of knowledge that you’ll find yourself frequently referring back to when learning newer, more sophisticated programs.

Excel is a robust tool that provides foundational knowledge for performing tasks such as:

  • Database management. Understanding the architecture of any data set is one of first steps of the data analytics workflow. In Excel, each worksheet can be thought of as a table in a database. Each row in a worksheet can then be considered a record while each column can be considered an attribute. As you continue to work with multiple worksheets and tables in Excel, you’ll learn that functions such as “VLOOKUP” and “INDEXMATCH” are similar to the “JOIN” clauses seen in SQL.
  • Data cleaning. Cleaning data is often one of the most crucial and time-intensive components of the data analytics workflow. Excel can be used to clean a data set using various string functions such as “TRIM”, “MID”, or “SUBSTITUTE”. Many of these functions cut across various programs and will look familiar when you learn similar functions in SQL and Tableau.
  • Data aggregation. Once the data’s been cleaned, you’ll need to summarize and compile it. Excel’s aggregation functions such as “COUNT”, “SUM”, “MIN”, or “MAX” can be used to summarize the data. Furthermore, Excel’s Pivot Tables can be leveraged to aggregate and filter data quickly and efficiently. As you continue to manipulate and aggregate data, you’ll begin to understand the underlying SQL queries behind each Pivot Table.
  • Statistics. Descriptive statistics and inferential statistics can be applied through Excel’s functions and add-ons to better understand our data. Descriptive statistics such as the “AVERAGE”, “MEDIAN”, or “STDEV” functions tell us about the central tendency and variability of our data. Additionally, inferential statistics such as correlation and regression can help to identify meaningful patterns in the data which can be further analyzed to make predictions and forecasts.
  • Dashboarding and visualization. One of the final steps of the data analytics workflow involves telling a story with your data. The combination of Excel’s Pivot Tables, Pivot Charts, and slicers offer the underlying tools and flexibility to construct dynamic dashboards with visualizations to convey your story to your audience. As you build dashboards in Excel, you’ll begin to uncover how the Pivot Table fields in Excel are the common denominator in almost any visualization software and are no different than the “Shelfs” used in Tableau to create visualizations.

If you want to jump into Excel but don’t have a data set to work with, why not analyze your own personal data? You could leverage Excel to keep track of your monthly budget and create a dashboard to see what your spending trends look like over time. Or if you have a fitness tracker, you could export the data from the device and create a dashboard to show your progress over time and identify any trends or areas for improvement. The best way to jump into Excel is to use data that’s personal and relevant — so your own health or finances can be a great start.

Excel at General Assembly

In GA’s part-time Data Analytics course and online Data Analysis course, Excel is the starting point for leveraging other analytical tools such as SQL and Tableau. Throughout the course, you’ll continually have “data déjà vu” as you tell yourself, “Oh this looks familiar.” Students will understand why Excel is considered a jack-of-all-trades by providing a great foundation in database management, statistics, and dashboard creation. However, as the saying goes, “A jack-of-all-trades is a master of none.” As such, students will also recognize the limitations of Excel and the point at which tools like SQL and Tableau offer greater functionality.

At GA, we use Excel to clean and analyze data from sources like the U.S. Census and Airbnb to formulate data-driven business decisions. During final capstone projects, students are encouraged to use data from their own line of work to leverage the skills they’ve learned. We partner with students to ensure that they are able to connect the dots along the way and “excel” in their data analytics journey.

Having a foundation in Excel will also benefit students in GA’s full-time Data Science Immersive program as they learn to leverage Python, machine learning, visualizations, and beyond, and those in our part-time Data Science course, who learn skills like statistics, data modeling, and natural language processing. GA also offers day-long Excel bootcamps across our campuses, during which students learn how to simplify complex tasks including math functions, data organization, formatting, and more.

Ask a Question About Our Data Programs

Meet Our Expert

Mathu A. Kumarasamy is a self-proclaimed analytics evangelist and aspiring data scientist. A believer in the saying that “data is the new oil,” Mathu leverages analytics to find, extract, refine, and distribute data in order to help clients make confident, evidence-based decisions. He is especially passionate about leveraging data analytics, technology, and insights from the field of behavioral economics to help establish a culture of evidence-based, value-driven health care in the United States. Mathu enjoys converting others into analytics geeks while teaching General Assembly’s part-time Data Analytics course in Atlanta.

Mathu A. Kumarasamy, Data Analytics Instructor, GA Atlanta

The Skills and Tools Every Data Scientist Must Master

By

women of color in tech

Photo by WOC in Tech.

“Data scientist” is one of today’s hottest jobs.

In fact, Glassdoor calls it the best job of 2017, with a median base salary of $110,000. This fact shouldn’t be big news. In 2011, McKinsey predicted there would be a shortage of 1.5 million managers and analysts “with the know-how to use the analysis of big data to make effective decisions.” Today, there are more than 38,000 data scientist positions listed on Glassdoor.com.

It makes perfect sense that this job is both new and popular, since every move you make online is actively creating data somewhere for something. Someone has to make sense of that data and discover trends in the data to see if the data is useful. That is the job of the data scientist. But how does the data scientist go about the job? Here are the three skills and three tools that every data scientist should master.

Continue reading

Announcing General Assembly’s New Data Science Immersive

By

DataImmersive_EmailArt_560x350_v1

Data science is “one of the hottest and best-paid professions in the U.S. More than ever, companies need analytical minds who can compile data, analyze it, and drive everything from marketing forecasts to product launches with compelling predictions. Their work drives the core strategies of modern business — so much so that, by 2018, data-related job openings will total 1.5 million. That’s why we’ve worked hard to develop classes, workshops, and courses to confront the data science skills gap. The latest addition to our proud family of data education is the new Data Science Immersive program.

Launching for the first time in San Francisco and Washington, D.C. on April 11, this full-time Immersive program will equip you with the tools and techniques you need to become a data pro in just 12 weeks.

Continue reading

The Best Topical Data Visualizations of 2015 (So Far)

By

Data-Visualization

Data visualization is a form of visual communication where data is presented in a pictorial or graphical format. By presenting complex data sets in a visual way, people can comprehend and analyze the information set faster and more clearly.

Continue reading