Data Category Archives - General Assembly Blog | Page 2

8 Tips for Learning Python Fast

By

It’s possible to learn Python fast. How fast depends on what you’d like to accomplish with it and how much time you can allocate to study and practice Python on a regular basis. Before we dive in further, I’d like to establish some assumptions I’ve made about you and your reasons for reading this article:

  • You have little to no prior experience learning Python. 
  • You have no Python programming background or coding experience.
  • You want to know how long it’s going to take to learn Python.
  • You’re interested in resources and strategies for learning Python.

First, I’ll address how quickly you should be able to learn Python. If you’re interested in learning the fundamentals of Python programming, it could take you as little as two weeks to learn, with routine practice. If you’re interested in mastering Python in order to complete complex tasks or projects or spur a career change, then it’s going to take much longer. In this article, I’ll provide tips and resources geared toward helping you gain Python programming knowledge in a short timeframe.

If you’re wondering how much it’s going to cost to learn Python, the answer there is also, “it depends”. There is a large selection of free resources available online, not to mention the various books, courses, and platforms that have been published for beginners.

Another question you might have is, “how hard is it going to be to learn Python?” That also depends. If you have any experience programming in another language such as R, Java, or C++, it’ll probably be easier to learn Python fast than someone who hasn’t programmed before. But learning a programming language like Python is similar to learning a natural language, and everyone’s done that before. You’ll start by memorizing basic vocabulary and learning the rules of the language. Over time, you’ll add new words to your repertoire and test out new ways to use them. Learning Python is no different.

By now you’re thinking, “Okay, this is great. I can learn Python fast, cheap, and easily. Just tell me what to read and point me on my way.” Not so fast. There’s a fourth thing you need to consider and that’s how to learn Python. Research on learning has identified that not all people learn the same way. Some learn best by reading, while others learn best by seeing and hearing. Some people enjoy learning through games rather than courses or lectures. As you review the curated list of resources below, consider your own learning preferences as you evaluate options.

Now let’s dig in. Below are my eight tips to help you learn Python fast.

1. Cover the following Python fundamentals.

At a bare minimum, you (and your resource) must cover the fundamentals. Without understanding them, you’ll have a hard time working through complex problems, projects or use cases. Examples of Python fundamentals include:

  • Variables and types
  • Lists, dictionaries, and sets
  • Basic operators
  • String formatting
  • Basic string operations
  • Conditions
  • Loops
  • Functions
  • List comprehensions
  • Classes and objects

If you’re really pressed for time, all of these fundamentals can be quickly explored on a number of different websites: docs.python.org, RealPython.org, stavros.io, developers.google.com, pythonforbeginners.org. See the section below on “Websites” for more details.

2. Establish a goal for your study.

Before you start learning Python, establish a goal for your study. The challenges you face as you start learning will be easier to overcome when you keep your goal in mind. Additionally, you’ll know what learning material to focus on or skim through as it pertains to your goals. For example, if you’re interested in learning Python for data analysis, you’re going to want to complete exercises, write functions, and learn Python libraries that facilitate data analysis. The following are typical examples of goals for Python that might pertain to you:

  • Data analysis
  • Data science and machine learning
  • Mobile apps
  • Website development
  • Work automation

3. Select a resource (or resources) for learning Python fast.

Python resources can be grouped into three main categories: interactive resources, non-interactive resources, and video resources. In-person courses are also an option, but won’t be covered in this post.

Interactive resources have become common in recent years through the popularization of interactive online courses that provide practical coding challenges and explanations. If it feels like you’re coding, that’s because you actually are. Interactive resources are typically available for free or a nominal fee, or you can sign up for a free trial before you buy. 

Non-interactive resources are your most traditional and time-tested; they’re books (digital and paperback) and websites (“online tutorials”). Many first-time Python learners prefer them due to the familiar and convenient nature of these mediums. As you’ll see, there are many non-interactive resources for you to choose from, and most are free.

Video resources were popularized over the past 10 years by MOOCs (massive online open courses) and resembled university lectures captured on video. In fact, they were often supported or promoted by leading universities. Now, there’s an abundance of video resources for various subjects, including programming in Python. Some of these video resources are pre-recorded courses hosted on learning platforms, and others are live-streamed courses provided by online education providers. General Assembly produces a live course in Python that covers Python fundamentals in one week

Below I’ve compiled a list of resources to help you get a jumpstart on learning Python fast. They fall into the categories laid out above, and at a bare minimum they cover Python basics. Throughout the list, I’ve indicated with an asterisk (*) which resources are free, to the best of my knowledge.

Interactive Resources: Tools and Lessons

  • CodeAcademy: One of the more popular online interactive platforms for learning Python fast. I know many Python programmers, myself included, who have taken CodeAcademy’s Python fundamentals course. It’s great for an absolute beginner, and you can knock it out in a week. It will get you excited about programming in Python. 
  • DataCamp: Short expert videos with immediate hands-on-keyboard exercises. It’s on-par with the CodeAcademy courses. 
  • *PythonTutor.com: A tool that helps you write and visualize code step by step. I recommend pairing this tool with another learning resource. This tool makes learning Python fundamentals a lot easier because you can visualize what your code is doing. 

Non-Interactive Resources

Non-interactive resources fall into two sub-categories: books and websites.

Books

In researching books, I noticed a majority of them were actually catered to existing programmers interested in learning Python or a master Python programmer looking for reliable reference material (“cookbooks”) or specialized literature. Below, I’ve listed only the books I think are helpful for beginners.

  • Introducing Python, 2nd Edition: This book mixes tutorials with cookbook-style code recipes to explain fundamental Python concepts.
  • Learn Python 3 The Hard Way: 52 well-developed exercises for beginners to learn Python. 
  • Python Basics: A Practical Introduction to Python 3: The website says it all — this book is designed to take you from beginner to intermediate. 
  • Python Crash Course, 2nd Edition: This book provides a foundation in general programming concepts, Python fundamentals, and problem solving through real-world projects.

Websites

At first, my list started off with over 20 examples of websites covering Python fundamentals. Instead of sharing them all, I decided to only include ones that had a clear advantage in terms of convenience or curriculum. All of these resources are free.

  • *Google’s Python Class: Tutorials, videos, and programming exercises in Python for beginners, from a Python-friendly company. 
  • *Hitchhiker’s Guide to Python: This guide helps you learn and improve your Python code and also teaches you how to set up your coding environment. The site search is incredibly effective at helping you find what you need. I can’t recommend this site enough. 
  • *Python for Everybody: An online book that provides Python learning instruction for those interested in solving data analysis problems. Available in PDF format in Spanish, Italian, Portuguese, and Chinese. 
  • *Python For You and Me: An online book that covers beginner and advanced topics in Python concepts, in addition to introducing a popular Python framework for web applications.
  • *Python.org: The official Python documentation. The site also provides a beginner’s guide, a Python glossary, setup guides, and how-tos.
  • *Programiz in Python: Programiz has a lengthy tutorial on Python fundamentals that’s really well done. It shouldn’t be free, but it is.
  • *RealPython.com: A large collection of specialized Python tutorials, most come with video demonstrations. 
  • *Sololearn: 92 chapters, 275 related quizzes, and several projects covering Python fundamentals that can also be accessed through a mobile app.
  • *Tutorialspoint.com: A no-frills tutorial covering Python basics. 
  • *W3Schools for Python: Another no-nonsense tutorial from a respected web-developer resource. 

Video Resources

Video resources have become increasingly popular and with good reason: they’re convenient. Why read a textbook or tutorial when you can cover the same material in video format on your computer or mobile device? They fall into two sub-categories: pre-recorded video-courses and live video courses.

Pre-Recorded Courses

  • Coursera: A large catalog of popular courses in Python for all levels. Most courses can be taken free, and paid courses come with certifications. You can also view courses on their mobile app.
  • EdX: Hosts university courses that focus on specific use cases for Python (data science, game development, AI) but also cover programming basics. EdX also has a mobile app.
  • Pluralsight: A catalog of videos covering Python fundamentals, as well as specialized topics like machine learning in Python.
  • RealyPython.com: A collection of pre-recorded videos on Python fundamentals for beginners.
  • *TreeHouse: A library of videos of Python basics and intermediate material.
  • EvantoTutsPlus: 7.6 hours of pre-recorded videos on Python fundamentals, plus some intermediate content.  
  • *Udacity: Provides a 5-week course on Python basics. Also covers popular modules in the Python Standard Library and other third-party libraries. 
  • Udemy: A library of popular Python courses for learners of all levels. It’s hard to single out a specific course. I recommend previewing multiple beginner Python courses until you find the one you like most. You can also view courses on their mobile app.

Live Courses

  • General Assembly: This live online course from General Assembly takes all of the guesswork out of learning Python. With General Assembly, you have a curated and comprehensive Python curriculum, a live instructor, a TA, and a network of peers and alumni you can connect with during and after the course.
Explore Our Python Course

4. Consider learning a Python library.

In addition to learning Python, it’s beneficial to learn one or two Python libraries. Libraries are collections of specialized functions that serve as “accelerators.” Without them, you’d have to write your own code to complete specialized tasks. For example, Pandas is a very popular library for manipulating tabular data. Numpy helps in performing mathematical and logical operations on arrays. Covering libraries would require another post — for now, review this Python.org page on standard Python libraries and this GitHub page on additional Python libraries.

5. Speed up the Python installation process with Anaconda.

You can go through the trouble of downloading the Python installer from the Python Software Foundation website, and then sourcing and downloading additional libraries; or you can download the Anaconda installer, which already comes with many of the packages you’ll routinely use, especially if you plan on using Python for data analysis or data science. 

6. Select and install an IDE.

You’ll want to install an integrated development environment (IDE), which is an application that lets you script, test, and run code in Python. 

When it comes to IDEs, the right one is the one that you enjoy using the most. According to various sources, the most popular Python IDEs/text editors are PyCharm, Spyder, Jupyter Notebook, Visual Studio, Atom, and Sublime. First, the good news: They’re all free, so try out a couple before you settle on one. Next, the “bad” news: Each IDE/text editor has a slightly different user interface and set of features, so it will take a bit of time to learn how to use each one.

For Python first-timers, I recommend coding in Jupyter Notebook. It has a simple design and a streamlined set of capabilities that won’t distract and will make it easy to practice and prototype in Python. It also comes with a dedicated display for dataframes and plots. If you download Anaconda, Jupyter Notebook comes pre-installed. Over time, I encourage you to try other IDEs that are better suited for development (Pycharm) or data science (Rodeo) and allow integrations (Sublime). 

Additionally, consider installing an error-handler or autocompleter to complement your IDE, especially if you end up working on lengthy projects. It will point out mistakes and help you write code quicker. Kite is a good option, plus it’s free and integrates with most IDEs.

7. When in doubt, use Google to troubleshoot code.

As you work on Python exercises, examples, and projects, one of the simplest ways to troubleshoot errors will be to learn from other Python developers. Just run a quick internet search and include keywords about your error. For example, “how to combine two lists in Python” or “Python how to convert to datetime” are perfectly acceptable searches to run, and will lead you to a few popular community-based forums such as StackOverFlow, Stack Exchange, Quora, Programiz, and GeeksforGeeks.

8. Schedule your Python learning and stick to it.

This is the part that most people skip, which results in setbacks or delays. Now, all you have left is to set up a schedule. I recommend that you establish a two-week schedule at a minimum to space out your studying and ensure you give yourself enough time to adequately review the Python fundamentals, practice coding in your IDE, and troubleshooting code. Part of the challenge (and fun) of learning Python or any programming language is troubleshooting errors. After your first two weeks, you’ll be amazed at how far you’ve come, and you’ll have enough practice under your belt to continue learning the more advanced material provided by your chosen resource. 

Concluding thoughts

By this point, we’ve established a minimum learning timeline, you know to select a learning goal for your study, you have a list of learning resources and learning method to choose from, and you know what other coding considerations you’ll need to make. We hope you make the most of these tips to accelerate your Python learning!

Explore Our Python Course

Data Literacy for Leaders

By

For years, the importance of data has been echoed in boardroom discussions and listed on company roadmaps. Now, with 99% of businesses reporting active investment in big data and AI, it’s clear that all businesses are beginning to recognize the power of data to transform our world of work.

While all leaders recognize the needs and benefits of becoming data-driven, only 24% have successfully created a data-driven organization. That is because transformation is not considered holistically and instead leaders focus on business, tools and technology and talent in silos. Usually leaving skill acquisition amongst leaders and the broader organization for last. It’s no wonder that 67% of leaders say they are not comfortable accessing or using data.

We’ve worked with businesses, such as Bloomberg, to help them gain the skills they need to successfully leverage data within their organizations & we haven’t left leaders out of the conversation. In fact, we know that leaders are crucial to the success of data transformation efforts & just like their teams, they need to be equipped with the skills to understand and communicate with data.

Why Should I Train My Leaders on Data?

When embarking on a data transformation, we always recommend that leaders be trained as the first step in company-wide skill acquisition. We recommend this approach for a few reasons:

  • Leaders Need to Understand Their Role in Data Transformation:  Analytics can’t be something data team members do in a silo. They need to be fully incorporated into the business, rather than an afterthought. However, businesses will struggle to make that change if every leader does not understand his or her responsibility in data transformation.
  • Leadership Training Shows a Commitment to Change: According to New Vantage Partners, 92% of data transformation failures are attributed to the inability of leaders to form a data-driven culture. In order for your employees to truly become data-driven, they have to be able to see a real commitment from leaders to organizational goals and operational change. Training your leaders first sends that message that data is here to stay. 
  • Leaders Need to Be Prepared to Work With Data-Driven Teams: Increasingly, leaders are expected to make data-driven decisions that impact the success of the organization. Without literacy, leaders will continue to feel uncomfortable communicating with and using data to make decisions. This discomfort will trickle down to employees and real change will never be felt. 

Just like your broader organization, leaders cannot be expected to understand the role they play or the importance of data transformation without proper training. 

What Does Data Literacy For Leaders Look Like? 

Leaders need to be able to readily identify opportunities to use data effectively. In order to get there leaders need to:

Build a Data-Driven Mindset:

While every leader brings a wealth of experience to your org, many leaders are not data natives, and it can be a big leap to make this shift in thinking. Training leaders all at once gives you the opportunity to get your leaders on the same page and build a shared understanding and vocabulary.

So what does building a data-driven mindset look like in practice? To truly have a data-driven mindset leaders must be aware of the data landscape, as well as the opportunity of data, be mindful of biases inherent in data with an eye towards overcoming that bias, as well as being curious about how data can influence our decisions.

Leaders should walk away from training with a baseline understanding of key data concepts, a shared vocabulary, knowing how data flows through an organization and be able to pinpoint where data can have an impact in the org.

Understand the Data Life Cycle

Leaders are responsible for having oversight of every phase of the data life cycle and must be able to help teams weed out bias at any point. Without this foundation, leaders will have a hard time knowing where to invest in a data transformation and how to lead projects and teams.

All leaders should be equipped to think about and ask questions about each phase of the life cycle. For example:

  • Data Identification: What data do we have, and what form is it in? 
  • Data Generation: Where will the data come from and how reliable is the source? 
  • Data Acquisition: How will the data get from the source to us? 

It is not the role of the leader to know where all the data comes from or what gaps exist, but being able to understand what questions to ask, is important to acquire the necessary insights to inform a sound business strategy.

Get to Know the Role of Data Within the Org

In an organization that’s undergoing a data transformation, there’s no shortage of projects that could command a leader’s attention and investment. Leaders must be equipped to understand where to invest to put their plans into action.

Based on existing structure, leaders need to understand the key data roles, such as data analysts or machine learning engineers, why they are important and how they differ. Once a leader has the knowledge of the data teams, they will be able to identify the opportunity of data within their team and role.

Make Better Data-Driven Decisions

Leaders who rely on intuition alone run the huge risk of being left behind by competitors that use data-driven insights. With more and more companies adjusting to this new world order, it’s imperative that leaders become more data literate in order to make important business-sustaining decisions moving forward. 

Leaders should walk away from training with a baseline understanding of key data concepts, a shared vocabulary, knowing how data flows through an organization and be able to pinpoint where data can have an impact in the org.

Getting Started With Leadership Training 

Including data training specifically for your leaders in your data transformation efforts is crucial. While leaders are busy tackling other important business initiatives, they, just like the rest of your organization must be set up with the right skills to successfully meet the future of work. Investment in data skills for leaders will help you to forge a truly data-driven culture and business.

To learn more about how GA equips leaders and organizations to take on data transformation get in touch with us here.

Five Ways to Build Organizational Data Literacy

By

Data is everywhere and in every part of your business; however, data is often left for technical teams to figure out. In recent years, data has been prioritized in digital transformation efforts, with an increasing amount of businesses striving to be data-first. Hoping to leverage new tools, technologies and hiring data analysts and scientists are often overlooking one essential fact: data is for everyone, and every employee can benefit from acquiring data skills.

Businesses who leave skills out of the equation in their data transformation efforts are further widening their skill gaps. In fact, according to Accenture, 74% of employees report feeling overwhelmed when working with data. According to Deloitte, contributors aren’t the only ones; 67% of leaders say they are not comfortable accessing or using data. It’s time to change all of this.

Perhaps this anxiety and discomfort stem from businesses misunderstanding the role every employee has in leveraging data: 

  • Leaders set the vision and use data to ensure that they are making the right business decisions. 
  • Data practitioners solve complex problems with a blend of technical ability in analytics and data science. 
  • The broader organization uses data to understand impact, communicate results, and make decisions. 

All roles can benefit from upskilling to shift mindsets, gain fluency, and build efficiencies across the business, with building literacy across the broader organization being the most urgent priority.

What does data literacy look like?

Data literacy is the ability to create, read, and analyze data, and then communicate that information and use it effectively. To do this, people must understand how data is collected, where it comes from, what it shows, how it can be used, and why it’s important. 

Being data-literate means understanding:

  • Data Culture
    • Literacy Goal:  Understanding the data lifecycle, data roles and responsibilities, and how data flows through an organization. 
  • Data Ethics & Privacy
    • Literacy Goal:   Explain why ethics and privacy are essential and understand the role each employee has to play. 
  • Data Visualizations
    • Literacy Goal:  Learn why common types of visualizations are chosen to promote certain comparisons and interpret the information. 
  • Statistics
    • Literacy Goal:  Describe data and spot trends in visualizations. 
  • Artificial Intelligence (AI)
    • Literacy Goal:  Identify opportunities to integrate AI and data science tools within your workflow.

Giving data skills to all employees will help businesses meet their loftiest data transformation goals. Training all employees comes with many benefits, such as higher decision quality and improved cross-functional communication. According to Deloitte, in companies where all employees train on analytics, 88% exceeded their business goals.

Five Ways to Build a Data-Literate Organization

1. Understand How Data is Being Used in Your Business

Shifting mindsets at the top of the org chart is essential to becoming a data-literate org. Being a role model for your employees helps build trust with your new skills — they will help you form a data-driven agenda. With the right skills, you’ll be able to prioritize projects with the most business impact.  Data literacy also helps you effectively communicate with data practitioners within your organization and help focus your contributors on the data points that truly matter.

2. Define Preferred Data Usage in Your Business 

Data is plentiful, so narrowing that data down to only the most essential points is imperative to success. Understand what data you wish to collect and track, how that data will be used, and what tools and skills are needed to leverage that data successfully. 

3. Get Leadership Buy-in Across the Business

Getting buy-in from leaders across  the business is essential to establishing a data-first culture. Any strategic initiative starts at the top, and leaders that understand the power of a strong data culture will be willing to make the tools, training, and people investments necessary to build one. 

4. Create a Training Plan

Once you know what data you wish to use, consider which skills would be the most beneficial. Remember, everyone can benefit from training. We recommend building literacy skills where there are definite gaps among leaders and across the broader organization.

5. Put New Skills Into Practice

Your plan is in place! Now, give your teams learning opportunities and explain why these skills will matter to the business’s success.After training, provide team members opportunities to practice their new skills by giving them goals directly related to using, communicating with, and becoming more data-proficient.

Continue to offer learning opportunities for those employees who wish to advance past literacy and into hard skills. Consider upskilling your data practitioners to become more efficient.

In an era of increased digitization, many businesses still don’t know how to use data to gain  critical insights and information on goals and objectives. From the intern to the C-suite, it’s more important than ever for all business members to create, read, analyze, and communicate data pertaining to these objectives. Data literacy at all levels can and should be encouraged to future proof the organization and support overall business goals. Investing in upskilling to ensure that everyone is comfortable bringing data to the table has ROIs well beyond cost. 

Thinking about building your teams’ data literacy? Learn more about how our data curriculum can help your business make this powerful pivot.

15 Data Science Projects to get you Started

By

When it comes to getting a job in data science, data scientists need to think like Creatives. Yes, that’s correct. Those looking to enter this field need to have a data science portfolio of previously completed data science projects, similar to those in Creative professions. What better way to prove to your future data science team that you’re capable of being a data scientist than proving you can do the work?

A common problem for data science entrants is that employers want candidates with experience, but how do you get experience without having access to experience? Suppose you’re looking to get that first foot in the door. It will behoove you to undertake a couple of data science projects to show future employers you’ve got what it takes to use big data to identify opportunities and succeed in the field.

The good news is that we live in a time of open and abundant data. Websites like Kaggle offer a treasure trove of free data for deep learning on everything from crime statistics to Pokemon to Bitcoin and more. However, the wealth of easily accessible data can be overwhelming, which is why we’ve taken it upon ourselves to present 15 data science projects you can execute in Python to showcase and improve your skills in data analytics. Our data science project ideas cover various topics, from Spotify songs to fake news to fraud detection and techniques such as clustering, regression, and natural language processing.

Before you dive in, be sure to adhere to these four guidelines no matter which data science project idea you choose:

1. Articulate the Problem and/or Scenario

It’s not enough to do a project where you use “X” to predict “Y”; you need to add some context to your work because data science does not occur in a vacuum. Tell us what you’re trying to solve and how data science can address that. Employers want to know if you can turn a problem into a question and a question into a solution. A good place to start is to depict a real-world scenario in which your data project would be useful.

2. Publish & Explain Your Work

Create a GitHub repository where you can upload your Jupyter Notebooks and data. Write a blog post in which you narrate your project from start to finish. Talk about the problem or question at the heart of the project, and explain your decision to clean the data in a certain way or why you decided to use a certain algorithm. Why all this? Potential employers need to understand your methodology.

3. Use Domain Expertise

If you’re trying to break into a specific field such as finance, health, or sports, use your knowledge of this area to enhance your project. This could mean deriving a useful question to a pressing problem or articulating a well-thought-out interpretation of your project’s results. For example, if you’re looking to become a data scientist in the finance sector, it would be worthwhile to show how your methods can generate a return on investment.

4. Be Creative & Different

Anyone can copy and paste code that trains a machine learning algorithm. If you want to stand out, review existing data science projects that use the same data and fill in the gaps left by them. If you’re working on a prediction project, try coming up with an unexpected variable that you think would be beneficial.

Data Science Projects

1. Titanic Data

Working on the Titanic dataset is a rite of passage in data science. It’s a useful dataset that beginners can work with to improve their feature engineering and classification skills. Try using a decision tree to visualize the relationships between the features and the probability of surviving the Titanic.

2. Spotify Data

Spotify has an amazing API that provides access to rich data on their entire catalog of songs. You can grab cool attributes such as a song’s acoustics, danceability, and energy. The great thing about this data source is that the project possibilities are almost endless. You can use these features to try to predict genre or popularity. One fun idea would be to better understand your music by training a machine learning classifier on two sets of songs; songs you like and songs you do not.

3. Personality Data Clustering

You’ve probably heard the phrase, “There are X types of people.” Well, now you can actually find out how many types of people there really are. Using this dataset of almost 20k responses to the Big Five Personality Test, you can actually answer this question. Throw this data into a clustering algorithm such as KMeans and sort this into K number of groups. Once you decide on the optimal number of clusters, it’s incumbent on you to define each cluster. Come up with labels that add meaning to each group, and don’t be afraid to use plenty of charts and graphs to support your interpretation.

4. Fake News

If you are interested in natural language processing, building a classifier to differentiate between fake and real news is a great way to demonstrate that. Fake news is a problem that social media platforms have been struggling with for the past several years and a project that tackles this problem is a great way to show you care about solving real-world problems. Use your classifier to identify interesting insights about the patterns in fake versus real news; for example, tell us which words or phrases are most associated with fake news articles.

5. COVID-19 Dataset

There probably isn’t a more relevant use of data science than a project analyzing COVID-19. This dataset provides a wealth of information related to the pandemic. It provides a great opportunity to show off your exploratory data analysis chops. Take a deep dive into this data, and through data visualization unearth patterns about the rate of COVID infection by county, state, and country.

6. Telco Customer Churn

If you’re looking for a straightforward project that is extremely applicable to the business world, then this one’s for you. Use this dataset to train a classifier that predicts customer churn. If you can show employers you know how to prevent customers from leaving their business, you’ll most definitely grab their attention. Pro tip: this is a great projection to show your understanding of classification metrics besides accuracies, such as precision and recall.

7. Lending Club Loans

Like the Telco project, the Lending Club loan dataset is extremely relevant to the business world. Here you can train a classifier that predicts whether or not a Lending Club loanee will pay back a loan using a wealth of information such as credit score, loan amount, and loan purpose. There are a lot of variables at your disposal, so I’d recommend starting with a handful of features and working your way up from there. See how far you can get with just the basics.

Also, this is a fairly untidy dataset that will require extensive cleaning and feature engineering, which is a good thing because that is often the case with real-world data. Be sure to explain your methodology behind preparing your dataset for the machine learning algorithm — this informs the audience of your domain expertise.

8. Breast Cancer Detection

This dataset provides a simpler classification scenario in which you can use health-related variables to predict instances of breast cancer. If you’re looking to apply your data science skills to the medical field, this is certainly worth a shot.

9. Housing Regression

If classification isn’t your thing, then might I recommend this ready-made regression project in which you can predict home prices using variables like square footage, number of bedrooms, and year built. A project such as this can help you understand the factors driving home sales and let you get creative in your feature engineering. Try to involve outside data that can serve as proxies for quality of life, education, and other things that might influence home prices. And if you want to show off your scraping skills, you can always create your dataset by scraping Zillow.

10. Seeds Clustering

The seeds dataset from UCI provides a simple opportunity to use clustering. Use the seven attributes to sort the 210 seeds into K number of groups. If you’re looking to go beyond KMeans, try using hierarchical clustering, which can be useful for this dataset because the low number of samples can be easily visualized with a dendrogram.

11. Credit Card Fraud Detection

Another project idea for those of you intent on using business world data is to train a classifier to predict instances of credit card fraud. The value of this project to you comes from the fact that it’s an imbalanced dataset, meaning that one class vastly outweighs the other (in this case, non-fraudulent transactions versus fraudulent). Training a model that is 99% accurate is essentially useless, so it’s up to you to use non-accuracy metrics to demonstrate the success of your model.

12. AutoMPG

This is a great beginner regression project in which you can use car features to predict their fuel efficiency. Given that this data is from the past, an interesting idea you can use is to see how well this model does on data from recent cars to show how car fuel efficiency has evolved over the years.

13. World Happiness

Using data science to unlock what’s behind happiness? Maybe you can with this dataset on world happiness rankings. You can go a number of ways with this project; you can use regression to predict happiness scores, cluster countries based on socio-economic characteristics, or visualize the change in happiness throughout the world from 2015 to 2019.

14. Political Identity

The Nationscape Data Set is an absolute goldmine of data on the demographics and political identities of Americans. If you’re a politics junkie, it’ll be sure to satisfy your fix. Their most recent round of data features over 300,000 instances of data collected from extensive surveys of Americans. If you’re interested in using demographic information for political ideology or party identification this is the dataset for you. This is an especially great project to flex your domain expertise in study design, research, and conclusion. Political analysis is replete with shoddy interpretations that lack empirical data analysis, and you could use this dataset to either confirm or dispel them. But be warned that this data will require plenty of cleaning, which you’ll need to get used to, given that’s the majority of the job.

15. Box Office Prediction

If you’re a movie buff, then we’ve got you covered with the TMDB dataset. See if you can build a workable box office revenue prediction model trained on 5000 movies worth of data. Does genre actually correlate with box office success? Can we use runtime and language to help explain the variation in the revenue? Find out the answers to those questions and more with this project.

Explore Data Workshops

Why Should You Become a Data Scientist?

By

Data is everywhere

The amount of data captured and recorded in 2020 is approximately 50 zettabytes, i.e., 50 followed by 21 zeros(!) and it’s constantly growing. Other than data captured from social media platforms, as individuals, we are constantly using devices that measure our health by tracking the number of footsteps, heart rate, sleep, and other physiological signals more regularly. Data analytics has helped greatly to discover patterns in our day-to-day activities and gently nudge us towards better health via everyday exercise and improving our quality of sleep. Just like how we track our health, internet sensors are used on everyday devices such as refrigerators, washing machines, internet routers, lights etc., to not only operate them remotely but also to monitor their functional health and provide analytics that help with troubleshooting in case of failure. 

Organizations are capturing data to better understand their products and help their consumers. Industrial plants today are installed with a variety of sensors (accelerometers, thermistors, pressure gauges) that constantly monitor high-valued equipment in order to track their performance and better predict downtime.  As internet users, we’ve experienced the convenience that results from capturing our browsing data — better search results on search engines, personalized recommendation on ecommerce websites, structured and organized inboxes, etc. Each of these features is an outcome of data science techniques of information retrieval and machine learning applied on big data. 

On the enterprise side, digital transformation such as digital payments and ubiquitous use of software and apps has propelled data generation. With a smart computer in every palm and a plethora of sensors both on commercial and industrial scale, the amount of data generated and captured will continue to explode. This constant generation of data drives new and innovative possibilities for organizations and their consumers through approaches and toolsets rooted in data science. 

Data science drives new possibilities

Data science is the study of data aimed towards making informed decisions.

On the one hand, monitoring health data and data analytics is guiding individuals to make better decisions towards their health goals. On the other hand, aggregation of health data at the community level in a convenient and accessible way sets the stage to conduct interdisciplinary research towards answering questions like, Does the amount of physical activity relate to our heart health? Can changes in heart rate over a period of time help predict heart disorders? Is weight loss connected with the quality of our sleep? In the past it was unimaginable to support such research with significant data points. However, today, a decade worth of such big data enables us to drive research on the parameters connected to different aspects of our health. It’s significant that this research is not restricted to laboratories and academic institutions but are instead driven by collaborative efforts between industry and academia.

Due to the infusion of such data, many traditional industries like insurance are getting disrupted. Previously, insurance premiums were calculated based on age and a single medical test that was performed at sign up. Now, there are efforts taken by life insurance providers to lower premiums through regular monitoring of their customers fitness trackers. With access to this big data, insurance providers are trying to understand and quantify health risks. The research efforts described above would drive quantifiable ways to measure overall health risk by fusing a variety of health metrics. All these new products will heavily rely on the use of advanced analytics that uses artificial intelligence and machine learning (AI/ML) techniques to develop models that predict personalized premiums. In order to drive these new possibilities for insights, the application of data science toolsets approaches goes through a rigorous process.

Data science is an interdisciplinary process

A data science process typically starts up with a business problem. Data required to solve the problem can come from multiple sources. Social media data such as text and images from social media platforms like Facebook and Instagram would be compartmentalized from enterprise data such as customer info and their transactions. However, depending on the problem to be solved, all relevant data are collected and can be fused across social media and enterprise domains to gain unique insights to solve the business problem.

A data science generalist works on different data formats and systematically analyses the data to extract insights from it. Data science can be subdivided into several specialized areas based on data format used to extract insights: (1) computer vision, i.e., field of study of image data, (2) natural language processing, i.e. analysis of textual data, (3) time-series processing, i.e. analysis of data varying in time such as stock market, sensor data, etc. 

A data scientist specialist is capable of applying advanced machine learning techniques, to convert unstructured data to structured format by extracting the relevant attributes of an entity from unstructured data with great accuracy. No other area has seen the impact of the data science generalist or the specialist more than in the product development lifecycle, across a gamut of organizations of all sizes.

Data scientist as a unifier in the product development lifecycle

The role of a data scientist spans across multiple stages of the product development process. Typically, a product development goes through the stages of envisioning, choosing different features to build and finally, designing those specific features. A data scientist is a unifier across all of these stages in the modern world. Even during the envisioning part, data analysis on the marketing data enables the decision on what features need to be built in terms of the need from the maximal number of customers and from a competitive standpoint. 

Once the feature list has been decided, the next step is designing those specific features. Typically, such design activities have been in the realm of designers and to a lesser extent developers. Traditionally, the designer designs features and then makes a judgment call based on user experience studies with a small sample size. However, what might be a good design for 10 users might not be a good design for 90 other users. In such situations, the designers’ judgment cannot necessarily address the entire user base. 

Organizations run different experiments to gather systematic data to audit the progress of the product. With data science toolsets, deriving the ground truth no longer needs to be constrained by such traditional design approaches. Based on the nature of the feature design, data from A/B experiment testing can provide input to both developers and designers alike on design options and product decisions that are optimal for the user base. 

Data science is the future

The spectrum of the data scientist’s role and contribution is vast. On one end, the data scientist can drive new possibilities through data-backed insights in areas like healthcare, suggest personalization options for users based on their needs, etc. On the other end, the data scientist can drive a cost-based discussion on which feature to design or what optimal option to choose. Data scientists are now the voices of customers throughout the product development process, and the unifiers through an interdisciplinary approach.

Just like making a presentation, editing documents and composing emails have become ubiquitous skills today, data science skills will pervasively be used across different functional roles to make business decisions. With the explosion in the amount of data, the demand for data scientists, data analysts, and big data engineers in the job market will only rise. Organizations are constantly looking for data professionals who can convert data into insights to make better decisions. A career in data science is simulating — the dynamic and ever-evolving nature of the field tied closely with current research keeps one young!

Explore Data Workshops

How to Get a Job in Data Science Fast

By

You want to get a data science job fast. Obviously, no one wants one to get a job slowly. But the time it takes to find a job is relative to you and your situation. When I was seeking my first data science job, I had normal just Kevin bills and things to budget for, plus a growing family who was hoping I’d get a job fast. This was different from some of my classmates, while others had their own versions of why they needed a job fast, too. I believe that when writing a how-to guide on getting a data science job quickly, we should really acknowledge that we’re talking about getting you, the reader, a job faster. Throughout this article, we’ll discuss how to get a job as a data scientist faster than you might otherwise, all things considered.

Getting a job faster is not an easy task in any industry, and getting a job faster as a data scientist has additional encumbrances. Some jobs, extremely well-paying jobs, require a nebulous skill set that most adults could acquire after several years in the professional working world. Data science is not one of those jobs. For all the talk about what a data scientist actually does, there’s a definite understanding that the set of skills necessary to successfully execute any version of the job are markedly technical, a bit esoteric, and specialized. This has pros and cons, which we’ll discuss. The community of people who aspire to join this field, as well as people already in the field, is fairly narrow which also has pros and cons.

Throughout this article, we’ll cover two main ways to speed up the time it takes to get a data science job: becoming aware of the wealth of opportunities, and increasing the likelihood that you could be considered employable.

Becoming Aware of the Wealth of Opportunities

Data science is a growing, in-demand field. See for yourself in Camm, Bowers, and Davenport’s article, “The Recession’s Impact on Analytics and Data Science” and “Why data scientist is the most promising job of 2019” by Alison DeNisco Rayome. It’s no secret however that these reports often only consider formal data science job board posts. You may have heard or already know that there exists a hidden job market. It stands to reason that if this hidden job market exists, there may also be a number of companies who have not identified their need for a data scientist yet, but likely need some portion of data science work. Here’s your action plan, assuming you already have the requisite skills to be a data scientist:

1. Find a company local to your region. This is easier if you know someone at that company, but if you don’t know anyone, just think through the industries that you’d like to build a career in. Search for several companies in those fields and consider a list of problems that might be faced by that organization, or even those industries at large.

2. Do some data work. Try to keep the scope of the project limited to something you could accomplish in one to two weekends. The idea here is not to create a thesis on some topic, but rather to add to your list of projects you can comfortably talk about in a future interview. This also does not have to be groundbreaking, bleeding edge work. Planning, setting up, and executing a hypothesis test for a company who is considering two discount rates for an upcoming sale will give you a ton more fodder for interviews over a half-baked computer vision model with no clear deliverable or impact on a business.

3. You have now done data science work. If you didn’t charge money for your services on the first run, shame on you. Charge more next time.

4. Repeat this process. The nice thing about these mini projects is that you can queue up your next potential projects while you execute the work for your current project at the same time.

Alternatively, you could consider jobs that are what I call the “yeah but there’s this thing…” type jobs. For example, let’s say you’re setting up a database for a non-profit and really that’s all they need. The thing is… it’s really your friend’s non-profit, all they need is their website to log some info into a database, and they can’t pay you. Of course you should not do things that compromise your morals or leave you feeling as though you’ve lowered your self worth in any way. Of course you’d help out your friend. Of course you would love some experience setting up a database, even if you don’t get to play with big data. Does that mean that you need to explain all of those in your next job interview? Of course not! Take the job and continue to interview for others. Do work as a data engineer. Almost everyone’s jobs have a “yeah but” element to them; it’s about whether the role will help increase your likelihood of being considered employable in the future.

Increasing the Likelihood That You Could Be Considered Employable

Thought experiment: a CTO comes to you with a vague list of Python libraries, deep learning frameworks, and several models which seem relevant to some problems your company is facing and tasks you with finding someone who can help solve those issues. Who would you turn to if you had to pick a partner in this scenario? I’ll give you a hint — you picked the person who satisfied three, maybe four criteria on what you and that team are capable of.

Recruiting in the real world is no different. Recruiters are mitigating their risk of hiring someone that won’t be able to perform the duties of the position. The way they execute is by figuring out the skills (usually indicated by demonstrated use of a particular library) necessary for the position, then finding the person who seems like they can execute on the highest number of the listed skills. In other words, a recruiter is looking to check a lot of boxes that limit the risk of you as a candidate. As a candidate, the mindset shift you need to come to terms with is that they want and need to hire someone. The recruiter is trying to find the lowest risk person, because the CTO likely has some sort of bearing on that recruiter’s position. You need to basically become the least risky hire, which makes you the best hire, amongst a pool of candidates.

There are several ways to check these boxes if you’re the recruiter. The first is obvious: find out where a group of people who successfully complete the functions of the job were trained, and then hire them. In data science, we see many candidates with training from a bootcamp, a master’s program, or PhDs. Does that mean that you need these degrees to successfully perform the function of the job? I’d argue no — it just means that people who are capable of attaining those relevant degrees are less risky to hire. Attending General Assembly is a fantastic way to show that you have acquired the relevant skills for the job.

Instead of having your resume alone speak to your skill, you can have someone in your network speak to your skills. Building a community of people who recognize your value in the field is incredibly powerful. While joining other pre-built networks is great, and opens doors to new opportunities, I’ve personally found that the communities I co-created are the strongest for me when it comes to finding a job as a data scientist. These have taken two forms: natural communities (making friends), and curated communities. Natural communities are your coworkers, friends, and fellow classmates. They become your community who can eventually speak up and advocate for you when you’re checking off those boxes. Curated communities might be a Meetup group that gathers once a month to talk about machine learning, or an email newsletter of interesting papers on Arxiv, or a Slack group you start with former classmates and data scientists you meet in the industry. In my opinion, the channel matters less, as long as your community is in a similar space as you.

Once you have the community, you can rely on them to pass things your way and you can do the same. Another benefit of General Assembly is its focus on turning thinkers into a community of creators. It’s almost guaranteed that someone in your cohort, or at a workshop or event has a similar interest as you. I’ve made contacts that passed alongside gig opportunities, and I’ve met my cofounder inside the walls of General Assembly! It’s all there, just waiting for you to act.

Regardless of what your job hunt looks like, it’s important to remember that it’s your job hunt. You might be looking for a side gig to last while you live nomadically, a job that’s a stepping stone, or a new career as a data scientist. You might approach the job hunt with a six-pack of post-graduate degrees; you might be switching from a dead end role or industry, or you might be trying out a machine learning bootcamp after finishing your PhD. Regardless of your unique situation, you’ll get a job in data science fast as long as you acknowledge where you’re currently at, and work ridiculously hard to move forward.

Explore Data Workshops

What is Data Science?

By

It’s been anointed “the sexiest job of the 21st century”, companies are rushing to invest billions of dollars into it, and it’s going to change the world — but what do people mean when they mention “data science”? There’s been a lot of hype about data science and deservedly so, but the excitement has helped obfuscate the fundamental identity of the field. Anyone looking to involve themselves in data science needs to understand what it actually is and is not.

In this article, we’ll lay out a deep definition of the field, complete descriptions of the data science workflow, and data science tasks used in the real world. We hope that any would-be entrants into this line of work will come away reading this article with a nuanced understanding of data science that can help them decide to enter and navigate this exciting line of work.

So What Actually is Data Science?

A quick definition of data science might be articulated as an interdisciplinary field that primarily uses statistics and computer programming to derive insights from and base decisions from a collection of information represented as numerical figures. The “science” part in data science is quite apt because data science very much follows a scientific process that involves formulating a hypothesis and using a specific toolset to confirm or dispel that hypothesis. At the end of the day, data science is about turning a problem into a question and a question into an answer and/or solution.

Tackling the meaning of data science also means interrogating the meaning of data. Data can be easily described as “information encoded as numbers” but that doesn’t tell us why it’s important. The value of data stems from the notion that data is a tangible manifestation of the intangible. Data provides solid support to aid our interpretations of the world. For example, a weather app can tell you it’s cold outside but telling you that the temperature is 38 degrees fahrenheit provides you with a stronger and specific understanding of the weather.

Data comes in two forms: qualitative and quantitative.

Qualitative data is categorical data that does not naturally come in the form of numbers, such as demographic labels that you can select on a census form to indicate gender, state, and ethnicity.

Quantitative data is numerical data that can be processed through mathematical functions; for example stock prices, sports stats, and biometric information.

Quantitative can be subdivided into smaller categories such as ordinal, discrete, and continuous.

Ordinal: A sort of qualitative and quantitative hybrid variable in which the values have a hierarchical ranking. Any sort of star rating system of reviews is a perfect example of this; we know that a four-star review is greater than a three-star review, but can’t say for sure that a four- star review is twice as good as a two-star review.

Discrete: These are countable and finite values that often appear in the form of integers. Examples include number of franchises owned by a company and number of votes cast in an election. It’s important to remember discrete variables have a finite range of numbers and can never be negative.

Continuous: Unlike discrete variables, continuous can appear in decimal form and have an infinite range of possibilities. Things like company profit, temperature, and weight can all be described as continuous. 

What Does Data Science Look Like?

Now that we’ve established a base understanding of data science, it’s time to delve into what data science actually looks like. To answer this question, we need to go over the data science workflow, which encapsulates what a data science project looks like from start to finish. We’ll touch on typical questions at the heart of data science projects and then examine an example data science workflow to see how data science was used to achieve success.

The Data Science Checklist

A good data science project is one that satisfies the following criteria:

Specificity: Derive a hypothesis and/or question that’s specific and to the point. Having a vague approach can often lead to a waste of time with no end product.

Attainability: Can your questions be answered? Do you have access to the required data? It’s easy to come up with an interesting question but if it can’t be answered then it has no value. The same goes for data, which is only useful if you can get your hands on it.

Measurability: Can what you’re applying data science to be quantified? Can the problem you’re addressing be represented in numerical form? Are there quantifiable benchmarks for success? 

As previously mentioned, a core aspect of data science is the process of deriving a question, especially one that is specific and achievable. Typical data science questions ask things like, does X predict Y and what are the distinct groups in our data? To get a sense of data science questions, let’s take a look at some business-world-appropriate ones:

  • What is the likelihood that a customer will buy this product?
  • Did we observe an increase in sales after implementing a new policy?
  • Is this a good or bad review?
  • How much demand will there be for my service tomorrow?
  • Is this the cheapest way to deliver our goods?
  • Is there a better way to segment our marketing strategies?
  • What groups of products are customers purchasing together?
  • Can we automate this simple yes/no decision?

All eight of these questions are excellent examples of how businesses use data science to advance themselves. Each question addresses a problem or issue in a way that can be answered using data science.

The Data Science Workflow

Once we’ve established our hypothesis and questions, we can now move onto what I like to call the data science workflow, a step-by-step description of a typical data science project process.

After asking a question, the next steps are:

  1. Get and Understand the Data. We obviously need to acquire data for our project, but sometimes that can be more difficult than expected if you need to scrape for it or if privacy issues are involved. Make sure you understand how the data was sampled and the population it represents. This will be crucial in the interpretation of your results.
  1. Data Cleaning and Exploration. The dirty secret of data science is that data is often quite dirty so you can expect to do significant cleaning which often involves constructing your variables in a way that makes your project doable. Get to know your data through exploratory data analysis. Establish a base understanding of the patterns in your dataset through charts and graphs.
  1. Modeling. This represents the main course of the data science process; it’s where you get to use the fancy powerful tools. In this part, you build a model that can help you answer a question such as can we predict future sales of a product from your dataset.
  1. Presentation. Now it’s time to present the results of your findings. Did you confirm or dispel your hypothesis? What are the answers to the questions you started off with? How do your results advance our understanding of the issue at hand? Articulate your project in a clear and concise manner that makes it digestible for your audience, which could be another team in your company or your company’s executives.

Data Science Workflow Example: Predicting Neonatal Infection

Now let’s parse out an example of how data science can affect meaningful real-world impact, taken from the book Big Data: A Revolution That Will Transform How We Live, Work, and Think.

We start with a problem: Children born prematurely are at high risk of developing infections, many of which are not detected until after a child is sick.

Then we turn that problem into a question: Can we detect patterns in the data that accurately predict infection before it occurs?

Next, we gather relevant data: variables such as heart rate, respiration rate, blood pressure, and more.

Then we decide on the appropriate tool: a machine learning model that uses past data to predict future outcomes.

Finally, what impact do our methods have? The model is able to predict the onset of infection before symptoms appear, thus allowing doctors to administer treatment earlier in the infection process and increasing the chances of survival for patients.

This is a fantastic example of data science in action because every step in the process has a clear and easily understandable function towards a beneficial outcome.

Data Science Tasks

Data scientists are basically Swiss Army knives, in that they possess a wide range of abilities — it’s why they’re so valuable. Let’s go over the specific tasks that data scientists typically perform on the job.

Data acquisition: For data scientists, this usually involves querying databases set up by their companies to provide easy access to reams of data. Data scientists frequently write SQL queries to retrieve data. Outside of querying databases, data scientists can use APIs or web scraping to acquire data.

Data cleaning: We touched on this before, but it can’t be emphasized enough that data cleaning will take up the vast majority of your time. Cleaning oftens means dealing with null values, dropping irrelevant variables, and feature engineering which means transforming data in a way so that it can be processed by a model.

Data visualization: Crafting and presenting visually appealing and understandable charts is a hugely valuable skill. Visualization has an uncanny ability to communicate important bits of information from a mass of data. Good data scientists will use data visualization to help themselves and their audiences better understand what’s going on.

Statistical analysis: Statistical tests are used to confirm and/or dispel a data scientist’s hypothesis. A t-test or chi-square are used to evaluate the existence of certain relationships. A/B testing is a popular use case of statistical analysis; if a team wants to know which of two website designs leads to more clicks, then an A/B test is the right solution.

Machine learning: This is where data scientists use models that make predictions based on past observations. If a bank wants to know which customers are likely to pay back loans, then they can use a machine learning model trained on past loans to answer that question.

Computer science: Data scientists need adequate computer programming skills because many of the tasks they undertake involve writing code. In addition, some data science roles require data scientists to function as software engineers because data scientists have to implement their methodologies into their company’s backend servers.

Communication: You can be a math and computer whiz, but if you can’t explain your work to a novice audience, your talents might as well be useless. A great data scientist can distill digestible insights from complex analyses for a non-technical audience, translating how a p-value or correlation score is relevant to a part of the company’s business. If your company is going to make a potentially costly or lucrative decision based on your data science work, then it’s incumbent on you to make sure they understand your process and results as much as possible.

Conclusion

We hope this article helped to demystify this exciting and increasingly important line of work. It’s pertinent to anyone who’s curious about data science — whether it’s a college student or an executive thinking about hiring a data science team — that they understand what this field is about and what it can and cannot do.

Explore Data Workshops

Designing a Dashboard in Tableau for Business Intelligence

By

Tableau is a data visualization platform that focuses on business intelligence. It has become very popular in recent years because of its flexibility and beauty. Clients love the way Tableau presents data and how easy it makes performing analyses. It is one of my favorite analytical tools to work with.

A simple way to define a Tableau dashboard is as a glance view of a company’s key performance indicators, or KPIs. There are different kinds of dashboards available — it all depends on the business questions being asked and the end-user. Is this for an operational team (like one at a distribution center) that needs to see the number of orders by hour and if sales goals are achieved? Or, is this for a CEO who would like to measure the productivity of different departments and products against forecast? The first case will require the data to be updated every 10 minutes, almost in real-time. The second doesn’t require the same cadence, and once a day will be enough to track the company performance.

Over the past few years, I’ve built many dashboards for different types of users, including department heads, business analysts, and directors, and helped many mid-level managers with data analysis. If you are looking for Tableau dashboard examples, you have come to the right place. Here are some best practices for creating Tableau dashboards I’ve learned throughout my career.

First Things First: Why Use a Data Visualization?

A data visualizations tool is one of the most effective ways to analyze data from any business process (sales, returns, purchase orders, warehouse operation, customer shopping behavior, etc.).

Below we have a grid report and bar chart that contain the same data source information. Which is easier to interpret?

Grid report

Bar Chart
Grid report vs. bar chart.

That’s right — it’s quicker to identify the category with the lowest sales, Tops, using the chart.

Many companies previously used grid reports to operate and make decisions, and many departments still do today, especially in retail. I once went to a trading meeting on a Monday morning where team members printed pages of Excel reports with rows and rows of sales and stock data by product and took them to a meeting room with a ruler and a highlighter to analyze sales trends. Some of these reports took at least two hours to prepare and required combining data from different data sources with VLOOKUPs — a function that allows users to search through columns in Excel. After the meeting, they threw the papers away (a waste of paper and ink), and then the following Monday it all started again.

Wouldn’t it be better to have an effective dashboard and reporting tool in which the company’s KPIs were updated daily and presented in an interactive dashboard that could be viewed on tablets/laptops and digitally sliced and diced? That’s where tools like Tableau server dashboards come in. You can drill down into details and answer questions raised in the meeting in real-time when creating a Tableau project – something you couldn’t do with paper copies.

How to Design a Dashboard in Tableau SERVER

Step 1: Identify who will use the dashboard and with what frequency.

Tableau dashboards can be used for many different purposes, such as measuring different KPIs, and therefore will be designed differently for each circumstance. This means that, before you can begin designing a new dashboard, you need to know who is going to use it and how often.

Step 2: Define your topic.

The stakeholder (i.e., director, sales manager, CEO, business analyst, buyer) should be able to tell you what kind of business questions need to be answered and the decisions that will be made based on the dashboard.

Here, I am going to use the dataset for my Tableau dashboard example from a fictional retail company to report on monthly sales.

The commercial director would like to know 1) the countries to which the company’s products have been shipped, 2) which categories are performing well, and 3) sales by product. The option of browsing products is a plus, so the tableau dashboard should include as much detail as possible.

Step 3: Initially, make sure you have all of the necessary data available to answer the questions specified in your new dashboard.

Clarify how often you will get the data, the format in which you will receive the data (inside a database or in loose files), the cleanliness of the data, and if there are any data quality issues. You need to evaluate all of this before you promise a delivery date.

Step 4: Create your dashboard.

When it comes to dashboard design, it’s best-practice to present data from top to bottom when in presentation mode. The story should go from left to right, like a comic book, where you start at the top left and finish at the bottom right.

Let’s start by adding the data set to Tableau. For this demo, the data is contained in an Excel file generated by software I developed myself. It’s all dummy data.

To connect to an Excel file from Tableau, select “Excel” from the Connect menu. The tables are on separate Excel sheets, so we’re going to use Tableau to join them, as shown in the image below. Once the tables are joined, go to the bottom and select Sheet 1 to create your first visualization.

Excel Sheet in Tableau
Joining Excel sheet in Tableau.

We have two columns in the Order Details table: Quantity and Unit Price. The sales amount is Quantity x Unit Price, so we’re going to create the new metric, “Sales Amount.” Right-click on the measures and select Create > Calculated Field.

Creating a Map in Tableau

We can use maps to visualize data with a geographical component and compare values across geographical regions. To answer our first question — “Which countries the company’s products have been shipped to?” — we’ll create a map view of sales by country.

1. Add Ship Country to the rows and Sales Amount to the columns.

2. Change the view to a map.

Map
Visualizing data across geographical regions.

3. Add Sales Amount to the color pane. Darker colors mean higher sales amounts aggregated by country.

4. You can choose to make the size of the bubbles proportional to the Sales Amount. To do this, drag the Sales Amount measure to the Size area.

5. Finally, rename the sheet “Sales by Country.”

Creating a Bar Chart in Tableau

Now, let’s visualize the second request, “Which categories are performing well?” We’ll need to create a second sheet. The best way to analyze this data is with bar charts, as they are to compare data across categories. Pie charts work in a similar way, but in this case we have too many categories (more than four) so they wouldn’t be effective.

1. To create a bar chart, add Category Name to the rows and Sales Amount to the columns.

2. Change the visualization to a bar chart.

3. Switch columns and rows, sort it by descending order, and show the values so users can see the exact value that the size of the rectangle represents.

4. Drag the category name to “Color.”

5. Now, rename the sheet to “Sales by Category.”

Sales category bar chart
Our Sales by Category breakdown.

Assembling a Dashboard in Tableau

Finally, the commercial director would like to see the details of the products sold by each category.

Our last page will be the product detail page. Add Product Name and Image to the rows and Sales Amount to the columns. Rename the sheet as “Products.”

We are now ready to create our first dashboard! Rearrange the chart on the dashboard so that it appears similar to the example below. To display the images, drag the Web Page object next to the Products grid.

Dashboard Assembly
Assembling our dashboard.

Additional Actions in Tableau

Now, we’re going to add some actions on the dashboard such that when we click on a country, we’ll see both the categories of products and a list of individual products sold.

1. Go to Dashboard > Actions.

2. Add Action > Filter.

3. Our “Sales by Country” chart is going to filter Sales by Category and Products.

4. Add a second action: Sales by Category will filter Products.

5. Add a third action, this time selecting URL.

6. Select Products, <Image> on URL, and click on the Test Link to test the image’s URL.

What we have now is an interactive dashboard with a worldwide sales view. To analyze a specific country, we click on the corresponding bubble on the map and Sales by Category will be filtered to what was sold in that country.

When we select a category, we can see the list of products sold for that category. And, when we hover on a product, we can see an image of it.

In just a few steps, we have created a simple dashboard from which any department head would benefit.

Dashboard
The final product.

Dashboards in Tableau at General Assembly

In GA’s Data Analytics course, students get hands-on training with the versatile Tableau platform. Students will learn the ins and outs of the data visualization tool and create dashboards to solve real-world problems in 1-week, accelerated or 10-week, part-time course formats — on campus and online. You can also get a taste in our interactive tableau training with these classes and workshops.

Ask a Question About Our Data Programs

Meet Our Expert

Samanta Dal Pont is a business intelligence and data analytics expert in retail, eCommerce, and online media. With an educational background in software engineer and statistics, her great passion is transforming businesses to make the most of their data. Responsible for the analytics, reporting, and visualization in a global organization, Samanta has been an instructor for Data Analytics courses and SQL bootcamps at General Assembly London since 2016.

Samanta Dal Pont, Data Analytics Instructor, General Assembly London

5 High-Paying Careers That Require Data Analysis Skills

By

Data-Driven-UX-Design

The term “big data” is everywhere these days, and with good reason. More products than ever before are connected to the Internet: phones, music players, DVRs, TVs, watches, video cameras…you name it. Almost every new electronic device created today is connected to the Internet in some way for some purpose.

The result of all those things connected to the Internet is data. Big, big data. What’s that mean for you? Simply put, it means if you can quickly, accurately, and intelligently sift through data and find trends, you are extremely valuable in today’s tech job market. More specifically, here are five job titles that require data analytics skills and expertise to get ahead. 

Continue reading

Computer Science vs. Data Science: What is the Difference?

By

Maybe you want to learn more about data science since you’ve heard it’s “the sexiest job of the 21st century.” Or maybe your software engineer friend is trying to talk you into learning computer science. Either way, both data science and computer science skills are in demand. In this article, we will cover the major differences between data science and computer science to clarify the distinction between these two fields.

Before we dive into the differences, let’s define these two sciences:

Data Science vs. Computer Science

Data science is an interdisciplinary field that uses data to extract insights and inform decisions. It’s often referred to as a combination of statistics, business acumen, and computer science. Data scientists clean, explore, analyze, and model with data using programming languages such as Python and R along with techniques such as statistical models, machine learning, and deep learning.

While it’s one part of data science, computer science is its own broader field of study involving a range of both theoretical and practical topics like data structures and algorithms, hardware and software, and information processing. It has many applications in fields like machine learning, software engineering, and mathematics.

History

While many of the topics used in data science have been around for a while, data science as a field is in its infancy. In 1974, Peter Naur defined the term “data science” in his work, Concise Survey of Computer Methods. However, even Naur couldn’t have predicted the vast amount of data that our modern world would generate on a daily basis only a few decades later. It wasn’t until the early 2000s that data science was recognized as its own field. It gained popularity in the early 2010s, leading to the field as we know it today — a blend of statistics and computer science to drive insights and make data-driven business decisions. “Data science,” “big data,” “artificial intelligence,” “machine learning,” and “deep learning” have all become buzzwords in today’s world. These are all components of data science and while trendy, they can provide practical benefits to companies. Historically, we did not have the storage capacity to hold the amount of data that we are able to collect and store today. This is one reason that data science has become a popular field only recently. The emergence of big data and the advancements in technology have paved the way for individuals and businesses to harness the power of data. While many of the tools that data scientists use have been around for many years, we have not had the software or hardware requirements to make use of these tools until recently.

Computer science, on the other hand, has been a field of study for centuries. This is one of the main differences between it and data science. Ada Lovelace is known for pioneering the field of computer science as the person who wrote the first computer algorithm in the 1840s. However, computing devices such as the abacus date back thousands of years. Computer science is a topic that has been formally researched for much longer than data science, and companies have been using computer science tools for decades. It’s an umbrella field that has numerous subdomains and applications. 

Applications

The applications of each of these fields in the industry differs as well. Computer science skills are used in many different jobs including that of a data scientist. However, common roles involving computer science skills include software engineers, computer engineers, software developers, and web developers. Two roles that use computer science, front end engineer and Java developer, ranked first and second respectively on Glassdoor’s 50 Best Jobs in America for 2020 list. While these roles do not formally require degrees, many people in these jobs hold a degree or come from a background in computer science. 

Common computer science job tasks include writing, testing, and debugging code, developing software, and designing applications. Individuals that use computer science in their roles often create new software and web applications. They need to have excellent problem solving skills and be able to write code in programming languages such as Python, Ruby, JavaScript, Java, or C#. They also need to have a fundamental understanding of how these languages work, and be well-versed in object oriented programming.

Data science is applied in job titles such as data scientist, data analyst, machine learning engineer, and data engineer. Data scientist and data engineer ranked third and sixth respectively on Glassdoor’s 50 Best Jobs in America for 2020. Individuals in these roles come from a variety of backgrounds including computer science, statistics, and mathematics. 

Common data science job tasks include cleaning and exploring data, extracting insights from data, and building and optimizing models. Data scientists analyze and reach conclusions based on data. They need to be well versed in statistics and mathematics topics including linear algebra and calculus as well as programming languages such as Python, R, and SQL. They also need to have excellent communication skills as they are often presenting insights, data visualizations, and recommendations to stakeholders.

Since computer science is one component of data science, there is often crossover in these roles and responsibilities. For example, computer science tasks like programming and debugging are used in both computer science jobs and data science jobs. Both of these fields are highly technical and require knowledge of data structures and algorithms. However, the depth of this knowledge required for computer science vs. data science varies. It’s often said that data scientists know more about statistics than a computer scientist but more about computer science than a statistician. This reinforces the interdisciplinary nature of data science.

The Use of Data

Data, or information such as numbers, text, and images, has applications in both computer science and data science. The study and use of data structures is a topic in computer science. Data structures are ways to organize, manage, and store data in ways that it can be used efficiently; a sub-domain of computer science, it allows us to store and access data in our computer’s memory. Data science benefits from data structures to access data, but the main goal of data science is to analyze and make decisions based on the data, often using statistics and machine learning.

The Future of Computer Science and Data Science

Today, all companies and industries can benefit from both of these fields. Computer scientists drive business value by developing software and tools while data scientists drive business value by answering questions and making decisions based on data. As software continues to integrate with our lives and daily routines, computer science skills will continue to be critical and in demand. As we continue to create and store vast amounts of data on a daily basis, data science skills will also continue to be critical and in demand. Both fields are constantly evolving as technology advances and both computer scientists and data scientists need to stay current with the latest tools, methods, and technologies.

The field of data science would not exist without computer science. Today, the two fields complement each other to further applications of artificial intelligence, machine learning, and personalized recommendations. Many of the luxuries that we have today — a favorite streaming service that recommends new movies, the ability to unlock our phones with facial recognition technology, or virtual home assistants that let us play our favorite music just by speaking — are made possible by computer science and made better by data science. As long as bright, motivated individuals continue to learn data science and computer science, these two fields will continue to advance technology and improve the quality of our lives.

Explore Data Workshops