data analytics Tag Archives - General Assembly Blog

A Beginner’s Guide To Tableau

By

Featuring Insights From Iun Chen & Vish Srivastava

Read: 2 Minutes

Tableau is a powerful data analysis and data visualization tool that anyone can use. It can be used by beginners to create simple charts and by advanced practitioners to solve complex business problems. It is user-friendly, easy to learn quickly, and includes a portfolio of business intelligence tools with the potential to give a wide range of roles the advantage of professionally analyzing data.

Simply put, if you can present data in a clear, compelling format, you gain a competitive advantage in today’s data-driven marketplace.

“Tableau enables you to quickly connect disparate data sources and utilize a drag-and-drop interface to analyze data and create dashboards,” says Vish Srivastava, who leads our Data Visualization & Intro to Tableau workshop. As a product leader at Evidation Health, he relies on Tableau to turn around fast data analysis. “For example, product teams use it to analyze user growth and analytics, BizOps teams use it to analyze operational data, and sales teams use it to analyze customer and revenue data.”

Businesses survive and thrive on data. The amount of data available to businesses today is impressive. To keep organizations on a successful path, analysts need to provide the key insights needed to make important decisions.

Here’s where Tableau comes in.

Tableau takes business intelligence to the next level, making it fast and efficient to analyze large amounts of data and create beautiful, presentation-ready visualizations that generate insights.

Data is the lifeblood of modern teams. Being able to quickly answer ad hoc questions and integrate data analysis into your day-to-day decision-making will make you an MVP. Though not all data analysts use Tableau, they do need some way to quickly create data visualizations.

Tableau is the data viz tool of choice.

Tableau is so popular in part because it is easy and fast to learn. In Iun Chen’s Intro to Data Analytics course, students learn the life-changing basics of Tableau in an afternoon. Aspiring analysts come to understand the power of data and the impact their numbers can have. As more data becomes available, there are more opportunities for data to be misused, a risk that every data scientist soon realizes. To quote the Nobel laureate and economist Ronald Coase, “If you torture the data long enough, it will confess.”

The ethics of data form the foundation of Chen’s syllabus so pitfalls are avoided from the start. “Overanalyzing and manipulating data too deeply can always give you the information you want,” says Chen. “Unfortunately, this is all too common in professional settings, though it’s usually unintentional.”

Tableau is a powerful tool.

Business insights are only as good as the data behind them, and the best data analysts understand that the human choices they make matter.

“Data is the perfect example of garbage in, garbage out,” says Srivastava, who defines good data as data that is ethically collected, complete, objective, and thoroughly analyzed. ”The double-edged sword of using powerful data analysis and visualization tools is that beautiful charts can create a false precision and obfuscate data integrity issues.”

To delve deeper into this topic, Chen recommends How Charts Lie, by Alberto Cairo, an exploration of how data can be altered:

“This book details how the use of data and data visualizations in journalism can be distorted and misleading, without the audience even realizing it, due to the urgency to present findings in a timely manner to the public.”

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Vish? https://www.linkedin.com/in/vishrutps

7 Tips to Learn Tableau Fast

By

Featuring Insights From Iun Chen & Vish Srivastava

Read: 2 Minutes

Let’s get it straight: How difficult is it to learn Tableau for a complete beginner? Are there shortcuts to learning Tableau? Any tips, tricks, or time-saving work-arounds? Thankfully, the answer is yes. Try these top tips, approved by our expert instructors, and start data viz now.

“It’s a little overwhelming at first but as soon as you understand the basics, like what are dimensions and measures, everything falls into place pretty quickly,” says Vish Srivastava, product leader at Evidation Health and GA instructor.

“In essence, you need to understand two things: The basics on how data works — for example, what are common formats of data and what is a primary key? And a basic understanding of data visualization in a business setting. Can you answer the question: When is a time series vs. a pie chart valuable for decision making?”

But can you really learn the basics of Tableau in an afternoon?

“The best way to learn is to download a sample dataset and dive right in and start creating data visualizations. To keep going from there, check out various portfolios online to get inspiration, and try to build those.”

According to Iun Chen, who conducts internal Tableau training at LinkedIn, Tableau is easy to learn, but hard to master.

“The basic concepts of charting and color theory are easy to pick up and can take just a few weeks. However, if you are looking to be a subject matter expert, this can take years to perfect,” she says. 

Chen preps students in her Intro to Data Analytics course to achieve close-to-mastery in these key areas.

  1. Can they quickly prep and analyze large volumes of data?
  2. Identify key information and determine the best visual method to present them?
  3. Take business questions and determine which visualizations to use?
  4. Translate raw datasets to storylines with a beginning, middle, and end? 
  5. Format charts, graphs, titles, text, and images for a polished deliverable? 
  6. Articulate best practices on design and visualization techniques?
  7. Provide feedback on ineffective visualizations and how to improve them?

    This checklist is the closest thing to a Tableau cheat sheet you’ll find. Prioritize these skills, and you’ll waste no time learning Tableau. Now that you know what you need to succeed, you can choose whether to take our Data Analytics course fast or slow. Learn Tableau — along with data analytics tools SQL and Excel — in a 1-week accelerated format, or over 10 weeks in the evening.

Chen sums it up perfectly: “As long as you are actively learning, applying your learnings, and ensuring innovation of your work, you will be a data visualization expert in no time.”

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Vish? https://www.linkedin.com/in/vishrutps

Top 3 Reasons to Learn Tableau

By

Featuring Insights From GA instructor Candace Pereira-Roberts

Read: 2 Minutes

Do you communicate data? Do you want to create more effective visualizations? Tableau is the data analytics tool you’re looking for. Here are the top three reasons why you should learn how to use Tableau, the popular data viz software focused on business intelligence.

#1 Tableau Is Easy

Data can be complicated. Tableau makes it easy. Tableau is a visualization tool that takes data and presents it in a user-friendly format of charts and graphs. And here’s the rub: There is no code writing required. You’ll easily master the end-to-end cycle of data analytics.


Need to showcase trends or surface findings? Tableau will make you an expert. Proficiency in business intelligence is a transferable skill that is quickly becoming the lifeblood of organizations. 

“I see students who are new to analytics learn Tableau desktop and be able to develop Tableau worksheets, dashboards, and story points in a couple of weeks — essentially a complete analysis project,” says Candace Pereira-Roberts, FinServ data engineer and one of our Data Analytics course instructors. She adds, “I like to share knowledge and watch people grow. I learn from my students as well.” 

 #2 Tableau Is Tremendously Useful

Would you rather tell visual stories with data? Or present the same old boring reports and tables? Is that even a question?

“Anyone who works in data should learn tools that help tell data stories with quality visualizations.” Full stop.

Data analysts and data engineers were quick to adopt and use Tableau, and it has given those roles a key competitive advantage in the recent data-related hiring frenzy. But their secret is out. And the advantages go beyond the usual tech roles. Having a working knowledge of data, and specifically knowing how to use Tableau, can help many more tech professionals become more attractive to recruiters and hiring managers.

Plus, it has a built-in career boost. Tableau’s visualizations are so elegant, you’ll be confident presenting the business intelligence and actionable insights to key stakeholders. Improving your presentation skills is par for the course.

#3 Tableau Data Analysts Are in Demand

As more and more businesses discover the value of data, the demand for analysts is growing. One advantage of Tableau is that it is so visually pleasing and easy for busy executives — and even the tech-averse — to use and understand. Tableau presents complicated and sophisticated data in a simple visualization format. In other words, CEOs love it.

Think of Tableau as your secret weapon. Once you learn it, you can easily surface critical information to stakeholders in a visually compelling format. That will make you a rockstar in any organization. 

“Tableau helps organizations leverage business intelligence to become more data-driven in their decision-making process.” Pereira-Roberts recommends participating in Makeover Monday to take your skills to an even higher level. 

Take Our Free 2-Hour Data Visualization Class

Want to learn more about Candace? Check out her thoughts on how to become a business intelligence analyst or connect with her on LinkedIn.

What Is Data Visualization?

By

An Interview With Iun Chen

Read: 4 Minutes

Data is big, and it’s getting bigger. How do you parse and understand data when the sheer amount of information can be overwhelming? The answer is data visualization. Using concepts of design theory like elements of color and layout, the discipline of data visualization, or data viz, is essentially the graphic representation of data. We called on one of our data viz experts, Iun Chen, to break it down further. 

Let’s start with an introduction and how you came to the world of data viz.

IC: I’m Iun (pronounced ‘yoon’), and I work in the data analytics space focusing on business intelligence tools and building scalable resources for LinkedIn. I also teach the 10-week Intro to Data Analytics course for GA, which includes the professional skills of SQL, Tableau, and Excel.

In college, I was a business major with a specialization in marketing and advertising. I became more interested in how the ad business model worked behind the scenes and in how software and systems worked. As a result, I worked at many major media companies in a quantitative capacity — revenue planning, ad pricing, finance, ad sales strategy. That led me into a formalized analytics route.

How do you define data visualization?

IC: Data visualization is the idea of communicating information graphically. It’s the science of information design, in which you take massive amounts of data in whatever format it comes in and use it to surface high-level insights and findings in a visually compelling way so audiences can easily understand the main points.

How does data visualization differ from data analytics?

IC: Data analytics is the process of cleaning, prepping, analyzing, and presenting data. Data visualization is part of the presenting data step and is defined as the act of visually organizing data through the use of charts, graphs, and dashboards. Concepts of data visualization are closely aligned with concepts of design theory: color, font, scale, layout, organization.

Why is data viz important?

IC: Data visualization is easy to learn but hard to master. In my classes, I heavily emphasize the design element of data visualization. It’s easy to whip together a quick bar or pie chart, but is it the best way to communicate the point you are trying to make? The goal of collecting mass amounts of data is to be able to quickly translate it into insights that can help make smart business decisions. The final form of this translation is often a chart or graph, which is why the ability to design and visualize these mass amounts of data grows as we collect more of it.

What is a data narrative?

IC: People think in stories and narratives, not in black and white figures. Just like you would share a story with a friend using a beginning, middle, and endpoint, you would do the same when sharing details about data analysis. Here’s a simple example.

  1. Beginning: Sales are down year-over-year; identify the symptoms.
  2. Middle: Furniture sales — our largest segment — are doing poorly in the last six months; conduct the analysis to investigate reasons and uncover root causes.
  3. End: Review retail store reports and conduct manufacturer visits; recommend next steps.

The key point to any data narrative is that it should present a compelling business case and surface unrealized insights to the audience. The business challenges, rationale, and next steps should be clearly presented, and people in the room should be able to walk away and know what to action on. 

Which tech roles use data visualization?

Data visualization — like data analytics — is a skill set that can be applied to any job. But if you are looking for a job that has data visualization skills as part of the function and responsibilities, look for roles like business analyst, data analyst, business intelligence analyst, data scientist, and data engineer. Keep in mind that the formal skill of data visualization is still relatively new, so depending on the maturity of the company, those functions may not be fully established yet. However, with the increase of data in the world, there’s a growing need for experts who understand data visualization techniques more and more.           

Check out this Medium post which details how Spotify’s business has evolved with the creation of their data visualization roles.

What’s the future of data visualization?

As we continue to collect more and more data, the need for people with the skills to analyze and present data becomes ever-growing and critical in the workplace environment. More companies will need to generate insights quickly to keep up with advances and competition in their respective industries. The skill of data visualization will become more and more attractive as teams and organizations seek to translate their data into insights more efficiently and effectively. The ability to work with data is increasingly critical to the success of any company in any job function. 

Iun Chen’s Recommended Data Viz Reading List

FlowingData

StorytellingWithData

InformationIsBeautiful

Tableau Public Gallery

New York Times Data Journalism

The WSJ Guide to Information Graphics

Storytelling with Data: A Data Visualization Guide for Business Professionals 

Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations

Edward Tufte’s The Visual Display of Quantitative Information

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen

Business Analytics Vs Data Analytics: What’s the Difference?

By

Featuring Insights From Iun Chen & Vish Srivastava

Read: 4 Minutes

Data analytics and business analytics are often confused, understandably, because both data analysts and business analysts work with data. What matters — and differentiates these two roles — is what the data is intended to do.

When comparing the roles of business analyst and data analyst, one must consider the audience. Who will be taking action based on the analyses?

Business analysts use data to improve business metrics.

Business analysts work directly with stakeholders to steer company objectives and keep the business on a successful path. They set and maintain key performance indicators for the organization. A business analyst may recommend strategies or business plans to executives, sometimes when a company is at a critical juncture, say quarterly or during a turnaround. Stakes can be high, but so can the rewards. (Think McKinsey analysts or other coveted consultancy jobs.) Business analysts are more likely to use presentation skills as they’ll need to present findings to executives and give recommendations in high-level meetings. 

Data analysts collect, extract, and analyze data.

Data analysts are more technically focused. They are responsible for getting the data and analyzing it, working with datasets and tables. For example, a data analyst at an eCommerce company may analyze customer information, aggregate email marketing lists, or use data to identify demographics for new customer acquisition plans. Data analysts are more likely to work in teams alongside marketing partners or with other technology roles such as programmers or product managers, depending on the size of the company. They also work with business partners across entire organizations, including business analysts, as needed for tasks and projects. 

Different roles mean different salaries.

Both business analysts and data analysts solve business problems. As such, they are in high demand. According to Glassdoor, the average salary for a data analyst in the U.S. is $72K. Compensation for business analysts is a bit more, averaging $79K. Of course, exact amounts depend on location and will vary from country to country. While a business analyst can command a higher salary, there is wider latitude for data analysts to carve out their niche in practically any industry. Since the function of data is increasingly integral to every enterprise, there is more flexibility for data analysts to dig into areas of the business where they can make the most difference, with more potential for creativity.  

In GA’s Intro to Data Analytics course, Iun Chen teaches SQL, Tableau, and Excel, business intelligence tools she uses in her professional role as a data analyst at LinkedIn.

“My formal job function is to build data tools for internal colleagues so they can successfully grow our business,” she says. “I create dashboards, reports, and anything else to ensure revenue keeps going up and anticipated risks go down for the company. In my experience, the skill set and mindset of the individual can define the role of a data analyst in any organization, large or small. Everyone uses data in their day to day so being able to clean, prep, analyze, and report data — regardless of what your actual job title is — is critical to not only the company’s success but your personal success as well.”

Both business analysts and data analysts are storytellers. 

Whether a business analyst’s more strategic and decision-making role is for you, or the technical, numbers-crunching, team-playing data analyst sounds more your speed, know that the two roles share one crucial skill: They use data to tell stories. Those stories lend insights that factor into decisions that affect the bottom line. Translating raw data into digestible and human narratives can be one of the most challenging skills for analysts to master, according to Vish Srivastava, who’s led multidisciplinary teams across tech sectors. So how does an analyst develop this multifaceted skill and set their career on the path for success?

“My recommendation is twofold,” he says. “One, always start your analysis with a hypothesis that you’re testing. You need to know right out of the gate why your analysis is going to matter. Two, after you’ve spent some time with your data, step away and write down your presentation storyline in three to five bullets. The final bullet should be your recommended next step. Of course, make sure you have the analysis and charts to back up your storyline and fill in the gaps as needed.”

When it comes to storytelling with data, the difference between a boring story and a compelling one can come down to data visualization. The tools at your disposal and your proficiency with them can make or break a presentation. Communicating the insights for business intelligence hinges on clear and impactful data viz, whether we’re talking business analytics or data analytics.

One classic example of data visualization’s power is the cholera map by John Snow, an early pioneer of disease mapping. “This is a beautiful example of how collecting data and visually presenting it can generate amazing insight,” says Srivastava. “In this case, the insight was that the sewer systems were spreading disease. This informed public policy and saved so many lives.”

The future of business intelligence will be determined by the democratization of data.

The prevalence of data and its part in tech careers is changing. To hear Srivastava tell it, future conversations on business intelligence will center less on the specificities of data analysis vs. business analysis and more on how data is creeping into even more roles.

“We’ve come a long way, but there is still far to go for data analysis skills to be deeply embedded in all functions across a company. In the future, I think we will see fewer dedicated teams for business analysis and data analysis; instead, all professionals will have these skills and utilize them daily. This democratization of data analysis will be incredibly powerful. It will create even more emphasis on making high-quality data available across every enterprise.”

Want to learn more about Iun?

https://www.linkedin.com/in/iunchen 
Want to learn more about Vish?
https://www.linkedin.com/in/vishrutps

How to Become a Data Analyst

By

Featuring Insights From Matt Brems & Vish Srivastava

Read: 4 Minutes

So, you want to be a data analyst? GA instructors Matt Brems and Vish Srivastava are data experts with deep experience across a wide range of industries. If anyone can start you on the path to your dream job of data analyst, it’s these two. Read on for advice and guidance on how to get there, what it takes and where to look for the jobs of the future.

Tell us about your experience in data — what brought you to the field?

Matt Brems: I was attracted to statistics and data science because I felt that too many decisions were made based on a gut feeling instead of data. In my experience, that usually meant that the loudest person in the room would make the decisions. Using statistics and data science to analyze evidence helps us to make better, more informed decisions that are less susceptible to bias.

Vish Srivastava: As a product manager, I’m faced with a deluge of data and need to make sense of what matters and how to use data to inform decision-making around the product. Quickly visualizing data and creating dashboards that get widely distributed are both critical skills to be an effective product manager.

What qualifications do you need to be considered for a data analyst job?

MB: Jobs that involve data analysis have many different titles. Some companies call these “data analysts,” other terms include “business intelligence analyst,” “marketing analyst,” “data scientist,” and others. Since different companies will have different names for the job, it’s not surprising that the qualifications for a data analyst (or similar) role will vary wildly from company to company.

The most common qualification is to know SQL, or Structured Query Language.

Most data analyst roles will expect some experience with data visualization. Having some background in statistics — even one or two courses at the college level or having online certifications — is often expected.

What traits make a data analyst successful?

VS: Simplicity, integrity, empathy, and patience. Simplicity because it can be very easy to go crazy and complicate an issue when you are in the weeds. Data analysis must always resist this urge and instead create clarity and complicity for stakeholders.

Integrity, because data analysts make a lot of crucial decisions like which outlier to remove and which insights to show vs. which ones to leave behind. You can easily tell a story that isn’t really true if you wanted to, and data analysts must hold themselves accountable to the truth. Mark Twain wasn’t kidding when he said that “there are lies, damned lies, and statistics.”

Empathy, because data analysts must always look at deliverables from the perspective of an audience. Will this be helpful to them, is it immediately legible, what questions will they have that you can preemptively address?

And lastly, patience. Data analysis involves a thankless job that people don’t really talk about — data cleaning!

How does someone start a career in data analytics

MB: I would start by searching data analyst roles at companies in which you’re interested. If you have most of the qualifications listed, go ahead and start applying! If you feel like you lack most of the qualifications, start exploring resources and courses that can help you close that gap. The three most important skills to know are likely to be SQL, statistics, and experience with a data visualization tool. There are lots of tutorials and courses available to teach you each of these.

How long does it take to become a data analyst?

MB: Data analysts are, in many cases, entry-level roles. New graduates from bootcamps or from college or university can often be accepted to data analyst roles. Some roles may require up to a few years of experience.

What’s the next big disruption? Where should candidates look for the jobs of the future?

VS: Every sector has been transformed by data, and this will continue to happen. But I think a very interesting one to watch is healthcare. Data in healthcare is trapped in various silos, like hospital systems (e.g., EHRs), insurance companies (e.g., claims data), clinical data (e.g., lab tests), and even personal health and fitness data (e.g., Apple Watch). This fragmentation of data, along with various pieces of regulation that govern how and when data is shared and used, means there is so much value that is currently untapped. As the industry moves to more interoperability and hopefully does so in a way that respects patient privacy and patient safety, we will see new opportunities quickly emerge.

Matt Brems teaches our Data Science Immersive, a bootcamp where students become fully-fledged data scientists in 12 weeks. He runs the consultancy BetaVector, where he solves data problems with Fortune 500 companies and startups alike. 

Vish Srivastava teaches our Data Analytics course. He has led multidisciplinary teams across many different tech sectors. He is currently a product leader at Evidation Health and is obsessed with building products to make the world a better place. 

Tableau vs. Power BI

By

Featuring Insights From Matt Brems

Read: 2 Minutes

Tableau and Power BI are powerful tools for business intelligence, with capabilities to take loads of big data and create elegant visualizations that convey key insights to stakeholders in easily digestible presentations. Both help organizations leverage business intelligence to become more data-driven in their decision-making process. So which tool is better? We asked a few industry experts their thoughts on the data analysis tools Tableau and Power BI. Here’s what they had to say.

Candace Pereira-Roberts, Data Engineer & GA Data Analytics Instructor

“Anyone who works in data should learn tools that help tell data stories with quality visualizations. Tableau is a wonderful tool for the technical and nontechnical to build these visualizations. I love how we teach the Tableau unit in the Data Analytics bootcamp. I see students who are new to analytics learn Tableau desktop and be able to develop Tableau worksheets, dashboards, and story points in a couple of weeks to do a complete analysis project.”

Iun Chen, GA Instructor & Data Analyst at LinkedIn 

“In my professional capacity, I lead data visualization workshops to share best practices on charting and design theory, with a focus on Tableau. But with the growth of big data analytics, there are more players in the data viz space. Looker. Qlik, Domo, and Microstrategy are a few with out-of-the-box solutions. Check out other marketplace BI and analytics leaders and their reviews at Gartner.

Alternatively, if you are up for the challenge you can start from scratch and build out completely customized solutions through coding packages, such as with Python plotting libraries Matplotlib, Pandas, and Seaborn.”

Matt Brems, GA Instructor & Data Consultant at BetaVector 

“Most data analyst roles will expect some experience with data visualization. They may prefer your visualization experience be tied to a certain tool like Tableau or Power BI or simply want you to have experience designing graphics or dashboards. As with any platform, the human element is key. A good data analyst is curious and detail-oriented. Diving into the data and spotting anomalies or identifying patterns requires curiosity. Looking at large datasets for long periods of time can invite mistakes, so being detail-oriented ensures you’re interpreting the data correctly.” 

Vish Srivastava, GA Instructor & Product Leader at Evidation Health

 “Most teams I’ve seen are not comparing Tableau and Power BI. Instead, it’s more about whether to adopt a business intelligence tool at all, or whether to use Tableau or Power BI in place of Excel. Tableau is a great option when you need to quickly create data visualizations.Tableau is incredibly powerful because it’s designed for nontechnical users, meaning business users can set up and tweak dashboards and charts without the support of engineering or data science teams.”

When it comes to research, the most common data analytics tool is SQL — no surprise there. But once you get into more niche industries, that can vary, says Brems.

“In academia, R is probably the most prevalent data analysis tool, though Python is quickly gaining popularity. SAS and Stata are often used in specific industries, though their popularity is diminishing. (R and Python are open source tools, which means, among other things, that they are free.)”

Want to learn more about Candace?
https://www.coursereport.com/blog/how-to-become-a-business-intelligence-analyst
https://generalassemb.ly/instructors/candace-roberts/13840
www.linkedin.com/in/candaceproberts

Want to learn more about Iun?
https://www.linkedin.com/in/iunchen 

Want to learn more about Matt?
https://betavector.com/
https://www.linkedin.com/in/matthewbrems

Want to learn more about Vish?
 https://www.linkedin.com/in/vishrutps

Today’s Best Data Analytics Tools

By

Featuring Insights From Matt Brems

Read: 3 Minutes

Our Data-Driven World

We live in a world of data — swimming in statistics, numbers, information — and the amount of data seems to be growing faster than we can keep up. More people are using data points to make decisions large and small. From which restaurant has the highest Yelp rating to which city has the lowest rates of COVID-19, using data to navigate everyday life is now the norm. Indeed, the pandemic has only increased our reliance on data. We have come to expect this tsunami of data to explain, and in some cases solve, many of the most vexing problems faced by society today. But finding key insights takes careful analysis of a staggering amount of data. No small feat.

It’s true that more data is released than ever before. In the U.S., there are currently over 290,000 datasets on data.gov alone. Clearly, there’s a growing need for data analysts and the data analytics tools that help us understand these numbers. From small businesses to the highest levels of governments, decisions turn on interpretations of data. Big data can have big consequences.
 

So how do data analysts find the insights lurking in a database? And what are the best tools to analyze all those numbers? Read on to discover the best data analytics tools in the market.

Data scientist and GA instructor since 2016, Matt Brems currently runs a data science consultancy called BetaVector. We asked him to share his go-to data analysis tools. “People who want to analyze data use many different tools; I like to break these down into three different types,” he says.

Let’s get to it.

Type #1: Tabular Data Tools

Data analysts need to get data out of databases and analyze that information. And to do that, they use tabular data tools. According to Brems, the most important ones to know are Microsoft Excel, Google Sheets, and SQL, or Structured Query Language. Generally considered the best data analysis tool for research, SQL is the most common qualification found in job descriptions for a data analyst.

“Most data that data analysts analyze comes in the form of a table, called tabular data. This just means that data is organized into rows and columns, like a spreadsheet. Most data analysts will use a spreadsheet tool like Microsoft Excel or Google Sheets. When working with significant amounts of data (large tables, many tables, or both), organizations will often use a database. In order to interact with most databases, SQL is by far the language of choice.”

Type #2: Programming Language Tools

Proficiency in a few programming tools, while not a prerequisite for basic data analysis, can give analysts the ability to perform a wide variety of tasks. While the needed programming language tools will vary from company to company and even from job to job, having this skill set as a data analyst is clearly an advantage for job seekers.

“Python and R are the most common programming language tools in data analysis, though Stata and SAS are also used in some industries. These tools can be used to perform automation, statistical modeling, forecasting, and visualization.”

Type #3: Data Visualization Tools

Since data analysts are frequently tasked with presenting results to stakeholders, a good data visualization tool is essential. Brems recommends Tableau and Microsoft PowerBI.

“While you can visualize data using programming languages, Tableau and PowerBI are two standalone tools that are used almost exclusively for the purposes of building static data visualizations and dashboards.”

A Note on Research 

When it comes to research, the most common data analytics tool is SQL — no surprise there. But once you get into more niche industries, that can vary, says Brems.

“In academia, R is probably the most prevalent data analysis tool, though Python is quickly gaining popularity. SAS and Stata are often used in specific industries, though their popularity is diminishing. (R and Python are open source tools, which means, among other things, that they are free.)”

Want to learn more about Matt?

https://betavector.com/

https://www.linkedin.com/in/matthewbrems

Why Should You Become a Data Scientist?

By

Data is everywhere

The amount of data captured and recorded in 2020 is approximately 50 zettabytes, i.e., 50 followed by 21 zeros(!) and it’s constantly growing. Other than data captured from social media platforms, as individuals, we are constantly using devices that measure our health by tracking the number of footsteps, heart rate, sleep, and other physiological signals more regularly. Data analytics has helped greatly to discover patterns in our day-to-day activities and gently nudge us towards better health via everyday exercise and improving our quality of sleep. Just like how we track our health, internet sensors are used on everyday devices such as refrigerators, washing machines, internet routers, lights etc., to not only operate them remotely but also to monitor their functional health and provide analytics that help with troubleshooting in case of failure. 

Organizations are capturing data to better understand their products and help their consumers. Industrial plants today are installed with a variety of sensors (accelerometers, thermistors, pressure gauges) that constantly monitor high-valued equipment in order to track their performance and better predict downtime.  As internet users, we’ve experienced the convenience that results from capturing our browsing data — better search results on search engines, personalized recommendation on ecommerce websites, structured and organized inboxes, etc. Each of these features is an outcome of data science techniques of information retrieval and machine learning applied on big data. 

On the enterprise side, digital transformation such as digital payments and ubiquitous use of software and apps has propelled data generation. With a smart computer in every palm and a plethora of sensors both on commercial and industrial scale, the amount of data generated and captured will continue to explode. This constant generation of data drives new and innovative possibilities for organizations and their consumers through approaches and toolsets rooted in data science. 

Data science drives new possibilities

Data science is the study of data aimed towards making informed decisions.

On the one hand, monitoring health data and data analytics is guiding individuals to make better decisions towards their health goals. On the other hand, aggregation of health data at the community level in a convenient and accessible way sets the stage to conduct interdisciplinary research towards answering questions like, Does the amount of physical activity relate to our heart health? Can changes in heart rate over a period of time help predict heart disorders? Is weight loss connected with the quality of our sleep? In the past it was unimaginable to support such research with significant data points. However, today, a decade worth of such big data enables us to drive research on the parameters connected to different aspects of our health. It’s significant that this research is not restricted to laboratories and academic institutions but are instead driven by collaborative efforts between industry and academia.

Due to the infusion of such data, many traditional industries like insurance are getting disrupted. Previously, insurance premiums were calculated based on age and a single medical test that was performed at sign up. Now, there are efforts taken by life insurance providers to lower premiums through regular monitoring of their customers fitness trackers. With access to this big data, insurance providers are trying to understand and quantify health risks. The research efforts described above would drive quantifiable ways to measure overall health risk by fusing a variety of health metrics. All these new products will heavily rely on the use of advanced analytics that uses artificial intelligence and machine learning (AI/ML) techniques to develop models that predict personalized premiums. In order to drive these new possibilities for insights, the application of data science toolsets approaches goes through a rigorous process.

Data science is an interdisciplinary process

A data science process typically starts up with a business problem. Data required to solve the problem can come from multiple sources. Social media data such as text and images from social media platforms like Facebook and Instagram would be compartmentalized from enterprise data such as customer info and their transactions. However, depending on the problem to be solved, all relevant data are collected and can be fused across social media and enterprise domains to gain unique insights to solve the business problem.

A data science generalist works on different data formats and systematically analyses the data to extract insights from it. Data science can be subdivided into several specialized areas based on data format used to extract insights: (1) computer vision, i.e., field of study of image data, (2) natural language processing, i.e. analysis of textual data, (3) time-series processing, i.e. analysis of data varying in time such as stock market, sensor data, etc. 

A data scientist specialist is capable of applying advanced machine learning techniques, to convert unstructured data to structured format by extracting the relevant attributes of an entity from unstructured data with great accuracy. No other area has seen the impact of the data science generalist or the specialist more than in the product development lifecycle, across a gamut of organizations of all sizes.

Data scientist as a unifier in the product development lifecycle

The role of a data scientist spans across multiple stages of the product development process. Typically, a product development goes through the stages of envisioning, choosing different features to build and finally, designing those specific features. A data scientist is a unifier across all of these stages in the modern world. Even during the envisioning part, data analysis on the marketing data enables the decision on what features need to be built in terms of the need from the maximal number of customers and from a competitive standpoint. 

Once the feature list has been decided, the next step is designing those specific features. Typically, such design activities have been in the realm of designers and to a lesser extent developers. Traditionally, the designer designs features and then makes a judgment call based on user experience studies with a small sample size. However, what might be a good design for 10 users might not be a good design for 90 other users. In such situations, the designers’ judgment cannot necessarily address the entire user base. 

Organizations run different experiments to gather systematic data to audit the progress of the product. With data science toolsets, deriving the ground truth no longer needs to be constrained by such traditional design approaches. Based on the nature of the feature design, data from A/B experiment testing can provide input to both developers and designers alike on design options and product decisions that are optimal for the user base. 

Data science is the future

The spectrum of the data scientist’s role and contribution is vast. On one end, the data scientist can drive new possibilities through data-backed insights in areas like healthcare, suggest personalization options for users based on their needs, etc. On the other end, the data scientist can drive a cost-based discussion on which feature to design or what optimal option to choose. Data scientists are now the voices of customers throughout the product development process, and the unifiers through an interdisciplinary approach.

Just like making a presentation, editing documents and composing emails have become ubiquitous skills today, data science skills will pervasively be used across different functional roles to make business decisions. With the explosion in the amount of data, the demand for data scientists, data analysts, and big data engineers in the job market will only rise. Organizations are constantly looking for data professionals who can convert data into insights to make better decisions. A career in data science is simulating — the dynamic and ever-evolving nature of the field tied closely with current research keeps one young!

Explore Data Workshops

What is Data Science?

By

It’s been anointed “the sexiest job of the 21st century”, companies are rushing to invest billions of dollars into it, and it’s going to change the world — but what do people mean when they mention “data science”? There’s been a lot of hype about data science and deservedly so, but the excitement has helped obfuscate the fundamental identity of the field. Anyone looking to involve themselves in data science needs to understand what it actually is and is not.

In this article, we’ll lay out a deep definition of the field, complete descriptions of the data science workflow, and data science tasks used in the real world. We hope that any would-be entrants into this line of work will come away reading this article with a nuanced understanding of data science that can help them decide to enter and navigate this exciting line of work.

So What Actually is Data Science?

A quick definition of data science might be articulated as an interdisciplinary field that primarily uses statistics and computer programming to derive insights from and base decisions from a collection of information represented as numerical figures. The “science” part in data science is quite apt because data science very much follows a scientific process that involves formulating a hypothesis and using a specific toolset to confirm or dispel that hypothesis. At the end of the day, data science is about turning a problem into a question and a question into an answer and/or solution.

Tackling the meaning of data science also means interrogating the meaning of data. Data can be easily described as “information encoded as numbers” but that doesn’t tell us why it’s important. The value of data stems from the notion that data is a tangible manifestation of the intangible. Data provides solid support to aid our interpretations of the world. For example, a weather app can tell you it’s cold outside but telling you that the temperature is 38 degrees fahrenheit provides you with a stronger and specific understanding of the weather.

Data comes in two forms: qualitative and quantitative.

Qualitative data is categorical data that does not naturally come in the form of numbers, such as demographic labels that you can select on a census form to indicate gender, state, and ethnicity.

Quantitative data is numerical data that can be processed through mathematical functions; for example stock prices, sports stats, and biometric information.

Quantitative can be subdivided into smaller categories such as ordinal, discrete, and continuous.

Ordinal: A sort of qualitative and quantitative hybrid variable in which the values have a hierarchical ranking. Any sort of star rating system of reviews is a perfect example of this; we know that a four-star review is greater than a three-star review, but can’t say for sure that a four- star review is twice as good as a two-star review.

Discrete: These are countable and finite values that often appear in the form of integers. Examples include number of franchises owned by a company and number of votes cast in an election. It’s important to remember discrete variables have a finite range of numbers and can never be negative.

Continuous: Unlike discrete variables, continuous can appear in decimal form and have an infinite range of possibilities. Things like company profit, temperature, and weight can all be described as continuous. 

What Does Data Science Look Like?

Now that we’ve established a base understanding of data science, it’s time to delve into what data science actually looks like. To answer this question, we need to go over the data science workflow, which encapsulates what a data science project looks like from start to finish. We’ll touch on typical questions at the heart of data science projects and then examine an example data science workflow to see how data science was used to achieve success.

The Data Science Checklist

A good data science project is one that satisfies the following criteria:

Specificity: Derive a hypothesis and/or question that’s specific and to the point. Having a vague approach can often lead to a waste of time with no end product.

Attainability: Can your questions be answered? Do you have access to the required data? It’s easy to come up with an interesting question but if it can’t be answered then it has no value. The same goes for data, which is only useful if you can get your hands on it.

Measurability: Can what you’re applying data science to be quantified? Can the problem you’re addressing be represented in numerical form? Are there quantifiable benchmarks for success? 

As previously mentioned, a core aspect of data science is the process of deriving a question, especially one that is specific and achievable. Typical data science questions ask things like, does X predict Y and what are the distinct groups in our data? To get a sense of data science questions, let’s take a look at some business-world-appropriate ones:

  • What is the likelihood that a customer will buy this product?
  • Did we observe an increase in sales after implementing a new policy?
  • Is this a good or bad review?
  • How much demand will there be for my service tomorrow?
  • Is this the cheapest way to deliver our goods?
  • Is there a better way to segment our marketing strategies?
  • What groups of products are customers purchasing together?
  • Can we automate this simple yes/no decision?

All eight of these questions are excellent examples of how businesses use data science to advance themselves. Each question addresses a problem or issue in a way that can be answered using data science.

The Data Science Workflow

Once we’ve established our hypothesis and questions, we can now move onto what I like to call the data science workflow, a step-by-step description of a typical data science project process.

After asking a question, the next steps are:

  1. Get and Understand the Data. We obviously need to acquire data for our project, but sometimes that can be more difficult than expected if you need to scrape for it or if privacy issues are involved. Make sure you understand how the data was sampled and the population it represents. This will be crucial in the interpretation of your results.
  1. Data Cleaning and Exploration. The dirty secret of data science is that data is often quite dirty so you can expect to do significant cleaning which often involves constructing your variables in a way that makes your project doable. Get to know your data through exploratory data analysis. Establish a base understanding of the patterns in your dataset through charts and graphs.
  1. Modeling. This represents the main course of the data science process; it’s where you get to use the fancy powerful tools. In this part, you build a model that can help you answer a question such as can we predict future sales of a product from your dataset.
  1. Presentation. Now it’s time to present the results of your findings. Did you confirm or dispel your hypothesis? What are the answers to the questions you started off with? How do your results advance our understanding of the issue at hand? Articulate your project in a clear and concise manner that makes it digestible for your audience, which could be another team in your company or your company’s executives.

Data Science Workflow Example: Predicting Neonatal Infection

Now let’s parse out an example of how data science can affect meaningful real-world impact, taken from the book Big Data: A Revolution That Will Transform How We Live, Work, and Think.

We start with a problem: Children born prematurely are at high risk of developing infections, many of which are not detected until after a child is sick.

Then we turn that problem into a question: Can we detect patterns in the data that accurately predict infection before it occurs?

Next, we gather relevant data: variables such as heart rate, respiration rate, blood pressure, and more.

Then we decide on the appropriate tool: a machine learning model that uses past data to predict future outcomes.

Finally, what impact do our methods have? The model is able to predict the onset of infection before symptoms appear, thus allowing doctors to administer treatment earlier in the infection process and increasing the chances of survival for patients.

This is a fantastic example of data science in action because every step in the process has a clear and easily understandable function towards a beneficial outcome.

Data Science Tasks

Data scientists are basically Swiss Army knives, in that they possess a wide range of abilities — it’s why they’re so valuable. Let’s go over the specific tasks that data scientists typically perform on the job.

Data acquisition: For data scientists, this usually involves querying databases set up by their companies to provide easy access to reams of data. Data scientists frequently write SQL queries to retrieve data. Outside of querying databases, data scientists can use APIs or web scraping to acquire data.

Data cleaning: We touched on this before, but it can’t be emphasized enough that data cleaning will take up the vast majority of your time. Cleaning oftens means dealing with null values, dropping irrelevant variables, and feature engineering which means transforming data in a way so that it can be processed by a model.

Data visualization: Crafting and presenting visually appealing and understandable charts is a hugely valuable skill. Visualization has an uncanny ability to communicate important bits of information from a mass of data. Good data scientists will use data visualization to help themselves and their audiences better understand what’s going on.

Statistical analysis: Statistical tests are used to confirm and/or dispel a data scientist’s hypothesis. A t-test or chi-square are used to evaluate the existence of certain relationships. A/B testing is a popular use case of statistical analysis; if a team wants to know which of two website designs leads to more clicks, then an A/B test is the right solution.

Machine learning: This is where data scientists use models that make predictions based on past observations. If a bank wants to know which customers are likely to pay back loans, then they can use a machine learning model trained on past loans to answer that question.

Computer science: Data scientists need adequate computer programming skills because many of the tasks they undertake involve writing code. In addition, some data science roles require data scientists to function as software engineers because data scientists have to implement their methodologies into their company’s backend servers.

Communication: You can be a math and computer whiz, but if you can’t explain your work to a novice audience, your talents might as well be useless. A great data scientist can distill digestible insights from complex analyses for a non-technical audience, translating how a p-value or correlation score is relevant to a part of the company’s business. If your company is going to make a potentially costly or lucrative decision based on your data science work, then it’s incumbent on you to make sure they understand your process and results as much as possible.

Conclusion

We hope this article helped to demystify this exciting and increasingly important line of work. It’s pertinent to anyone who’s curious about data science — whether it’s a college student or an executive thinking about hiring a data science team — that they understand what this field is about and what it can and cannot do.

Explore Data Workshops