data science Tag Archives - General Assembly Blog

What It’s Really Like to Change Your Career Online

By

Going to work used to mean physically traveling to a workplace. Whether by foot, public transit, or car — a job was a specific location to which you commuted. But with the advent of the gig economy and advances in technology, telecommuting has become more and more prevalent. In fact, according to a 2018 study, approximately 70% of workers worldwide spend at least one day a week working from home.

So, why should education be any different? Learning from the comfort of home saves you the time and money you would’ve spent commuting, allows you to spend more time with loved ones, and encourages a much more comfortable, casual work environment.

That’s why we’re now offering all of our career-changing Immersives online. We’ve transformed over 11K+ careers — so whether you’re interested in becoming a software engineer, data scientist, or UX designer, you can trust our proven curriculum, elite instructors, and dedicated career coaches to set you up for professional success.

We sat down with three experts on GA’s Immersive Remote programs to better understand how they work — and more importantly — how they compare to the on-campus experience.

Breaking Barriers

GA Education Product Manager Lee Almegard explained the reasoning behind the move: “At GA, the ability to pay tuition, commute to class, or coordinate childcare shouldn’t be a barrier to launching a new career, she said. “Our new 100% remote Immersive programs are designed to ease these barriers.”

Obviously, saving yourself a trip to campus is appealing on many levels, but some interested students expressed concern that they wouldn’t receive enough personalized attention studying online as opposed to IRL. Instructor Matt Huntington reassures them, saying “Our lectures are highly interactive, and there is ample time to ask questions — not only of the teacher but also of other students.” 

Staying Focused

It’s not always easy to stay focused in a traditional classroom, but when your fellow students have been replaced by a curious toddler or Netflix is only a click away, distraction is a real concern. 

GA graduate Alex Merced shared these worries when he began his Software Engineering Immersive Remote program, but they quickly disappeared. “The clever use of Slack and Zoom really made the class engaging. It leverages the best features of both platforms, such as polls, private channels, and breakout rooms,” he said. “This kept the class kinetic, social, and engaging, versus traditional online training that usually consists of fairly non-interactive lectures over PowerPoint.”

If you’re concerned about staying focused, you can use these simple, impactful tips to stay motivated and on track to meet your goals:

  • Plan ahead. Conquer homework by blocking off time on your calendar each week during the hours in which you focus best.
  • Limit distractions. Find a quiet place to study, put your device on “Do Not Disturb” mode, or find a productivity app like Freedom to block time-consuming sites when studying or working independently.
  • Listen to music. You might find that music helps you concentrate on homework. Some of our favorite Spotify playlists to listen to are Deep Focus, Cinematic Chillout, and Dreamy Vibes.
  • Take breaks. Go for a short walk at lunch and change up the scenery, or grab a latte to power through an assignment.
  • Ask for help. We’re here for you! Our instructional team is available for guidance, feedback, technical assistance, and more during frequent one-on-one check-ins and office hours.

Most importantly, listen to yourself. Everyone learns differently, so take stock of what works best for you. Find the strategies that fit your learning style, and you’ll be well on your way to new skills and new heights. 

Getting Connected and Getting Hired

Another key component of learning is the camaraderie that comes from meeting and studying with like-minded students. How does that translate to a virtual classroom?

GA Career Coach Ruby Sycamore-Smith explains that both students and faculty can have meaningful, productive relationships without ever meeting in person. We’re a lot more intentional online,” she says. “You’re not able to just bump into each other in the corridor as you would on campus, but that means you’re able to be a lot more purposeful with your time when you do connect — way beyond a simple smile and a wave. Merced agrees. “Breakout sessions allowed me to assist and be assisted by my classmates, with whom I’ve forged valuable relationships. Now I have friends all over the world.” And as Huntington pointed out, “There is no back of the classroom when you’re online.” When you learn remotely, every seat is right next to all of your peers.

When we piloted the Software Engineering Remote bootcamp, we took extra care to make sure that our virtual classrooms felt exactly like the on-campus ones, with group labs and even special projects to ensure students are constantly working with each other,” Huntington explained. “A lot of our students form after-hours homework groups, and nighttime TAs create study hall video conferences so everyone can see and talk to each other.” 

And with students from all over the country, you’re going to connect with people you never would’ve met within the confines of a classroom. These peers could even be the very contacts who help you get you hired.

By recruiting industry professionals who are also gifted instructors to lead courses, students are taught how to translate their knowledge into in-demand skill sets that employers need. Sycamore-Smith explains that the involvement of GA’s career coaches doesn’t end after graduation; they’re invested in their students’ long-term success.

She says, “Career preparation sessions are very discussion-based and collaborative, as all of our students have varied backgrounds. Some are recent college graduates, others may have had successful careers and experienced a number of job hunts previously. Everyone has unique ideas and insights to share, so we use these sessions to really connect and learn from one another.” 

Merced is enthusiastic about his GA experience and quickly landed a great job as a developer. “Finding work was probably the area I was most insecure about going into the class,” he confessed. “But the prep sessions really made the execution and expectations of a job search much clearer and I was able to land firmly on my feet.

Conclusion? Make Yourself at Home

After years of teaching in front of a brick-and-mortar classroom, Huntington was a little wary about his move to digital instructor, but his misgivings quickly gave way. 

I was surprised to feel just as close to my virtual students as I did to my on-campus students, he said. “Closing down our virtual classrooms and saying goodbye on the last day of class is so much more heart-wrenching online than it ever was for me when I taught on campus.” 

Huntington’s advice to a student wondering if online learning is right for them: “Go for it! It’s just like in person, but there’s no commute and it’s socially acceptable to wear pajamas!”

Learn About Our Immersive Remote Programs

The Study of Data Science Lags in Gender and Racial Representation

By

data science gender race disparity

In the past few years, much attention has been drawn to the dearth of women and people of color in tech-related fields. A recent article in Forbes noted, “Women hold only about 26% of data jobs in the United States. There are a few reasons for the gender gap: a lack of STEM education for women early on in life, lack of mentorship for women in data science, and human resources rules and regulations not catching up to gender balance policies, to name a few.” Federal civil rights data further demonstrate that “black and Latino high school students are being shortchanged in their access to high-level math and science courses that could prepare them for college” and for careers in fields like data science.

As an education company offering tech-oriented courses at 20 campuses across the world, General Assembly is in a unique position to analyze the current crop of students looking to change the dynamics of the workplace.

Looking at GA data for our part-time programs (which typically reach students who already have jobs and are looking to expand their skill set as they pursue a promotion or a career shift), here’s what we found: While great strides have been made in fields like web development and user experience (UX) design, data science — a relatively newer concentration — still has a ways to go in terms of gender and racial equality.

Continue reading

How Ridgeline Plots Visualize Data and Present Actionable Decisions

By

Organizations conduct survey research for any number of reasons: to decide which products to devote resources to, determine customer satisfaction, figure out who our next president will be, or determine which Game of Thrones characters are most attractive. But almost all surveys are conducted with samples of the target population and therefore are subject to sampling error.

Decision-makers need to understand this error to make the most of survey results, so it’s important for data scientists and analysts to communicate confidence intervals when visualizing estimated results. Confidence intervals are the range of values you could reasonably expect to see in your target population based on the results measured in your sample.

But traditional visuals (error bars) can lead to misperceptions, too. In situations where confidence intervals overlap by a small amount, we know there is really small chance of two values being equal — but overlapping error bars on a chart still signal danger. Ridgeline plots, which are essentially a series of density plots (or smoothed-out histograms), can help balance the need to communicate risk without overemphasizing error in situations where error bars only slightly overlap. Instead of showing an error bar, which is the same size from top to bottom, a ridgeline plot gets fatter to represent more likely values and thinner to represent less likely values. This way, a small amount of overlap doesn’t signal lack of statistical significance quite as loudly.

Calculating Confidence Intervals: Planning a Class

Consider, for example, an education startup that conducted a survey of 500 people on its email list to determine which of three classes respondents might want to enroll in. (For demonstration purposes, we’re assuming this is a random sample that’s representative of the target audience.) The options are Hackysack Maintenance, Underwater Basketweaving, and Finger Painting. Results are reported below:

Classes Results (%)
Hackysack Maintenance 24
Underwater Basketweaving 44
Finger painting 32

We could produce a bar plot of this result that makes Underwater Basketweaving appear to be the clear-cut winner.

Basketweaving Graph

But since this data comes from a representative sample, there is some margin of error for each of these point estimates. This post won’t go into calculating these confidence intervals except to say that we used the normal approximation method to calculate binomial confidence intervals for each of the three survey results at a 99.7% confidence level. Now our results look more like this:

Classes Results (%) Lower Conf. Int. (%) Upper Conf. Int. (%)
Hackysack Maintenance 24 18 30
Underwater Basketweaving 44 37 51
Finger painting 32 26 38

One common way to present these confidence intervals is by adding error bars to the plot. When we add these error bars, our plot looks like this:

Basketweaving Error Bars

Unfortunately, our error bars are now overlapping between Finger Painting and Underwater Basketweaving. This means there is some chance that the two courses are equally desirable — or that Finger Painting is actually the most desirable course of all! Decision-makers no longer have a clear-cut investment since the top two responses could be tied.

However, those error bars barely overlap. There’s a strong probability that Underwater Basketweaving really is the winner. The problem with this method of plotting error bars is that the visual treats every part of our confidence interval distribution as equally likely instead of the bell curve it should look like.

Enter the ridgeline plot.

What Is a Ridgeline Plot?

Ridgeline plots essentially stack density plots for multiple categorical variables on top of one another. Claus Wilke created ridgeline plots — originally named joy plots — in the summer of 2017, and the visual has rapidly gained popularity among users of the R programming language. They’ve been used to show the changing polarization of political partiessalary distributions, and patterns of breaking news.

By using a ridgeline plot rather than a bar plot, we can present our confidence intervals as the bell curves they are, rather than a flat line. Instead of a bar that implies a clear winner and some error bars that contradict that narrative, the ridgeline plot demonstrates that, indeed, the bulk of possible values for each class are basically different from one another. In the process, the ridgeline plot downplays the small amount of overlap between Finger Painting and Underwater Basketweaving.

Basketweaving Ridgeline Plot

By plotting only the confidence intervals in the form of individual density plots, the ridgeline plot demonstrates the small amount of risk that students really prefer a class on finger painting  without overemphasizing the magnitude of that risk. Our education startup can invest in curriculum development and promotion of the Underwater Basketweaving class with a strong degree of confidence that most of its potential students would be most interested in such a class.

Ridgeline Plots at General Assembly

In General Assembly’s full-time, career-changing Data Science Immersive program and part-time Data Science course, students learn about sampling, calculating confidence intervals, and using data visualizations to help make actionable decisions with data. Students can also learn about the programming language R and other key data skills through expert-led workshops and exclusive industry events across GA’s campuses.

Ask a Question About Our Data Programs

Meet Our Expert

Josh Yazman is a General Assembly Data Analytics alum and a data analyst with expertise in media analytics, survey research, and civic engagement. He now teaches GA’s part-time Data Analytics course in Washington, D.C. Josh spent five years working in Virginia for political candidates at all levels of government, from Blacksburg town council to president. Today, he is a data analyst with a national current-affairs magazine in Washington, D.C., a student at Northwestern University pursuing a master’s degree in predictive analytics, and the advocacy chair for the National Capital Area chapter of the Pancreatic Cancer Action Network. He occasionally writes about political and sports data on Medium and tweets at @jyazman2012.

“Data science as a field is in demand today — but the decision-making and problem-solving skills you’ll learn from studying it are broadly applicable and valuable in any field or industry.”

– Josh Yazman, Data Analytics Instructor, General Assembly Washington, D.C.

How Predictive Modeling Forecasts the Future and Influences Change

By

You know the scenario: You get to work in the morning and quickly check your personal email. Over on the side, you notice that your spam folder has a couple of items in it, so you look inside. You’re amazed — although some of them look like genuine emails, they’re not; these cleverly disguised ads are all correctly labeled as spam. What you’re seeing is natural language processing (NLP) in action. In this instance, the email service provider is using what’s known as predictive analytics to assess language data and determine which combinations of words are likely spam, filtering your email accordingly.

With the volume of data being created, collected, and stored increasing by the hour, the days of making decisions based solely on intuition are numbered. Companies collect data on their customers, nonprofits collect data on their donors, apps collect data on their users, all with the goal of finding opportunities to improve their products and services. More and more, decision-making is becoming data driven. People use information to understand what’s happening in the world around them and try to predict what will happen in the future. For this, we turn to predictive analytics.

Predictive analytics is the concept of using current information to forecast what will happen next time. This area of study covers a broad range of concepts and skills — oftentimes involving modeling techniques — that help turn data into insights and insights into action. These ideas are already in practice in industries like eCommerce, direct marketing, cybersecurity, financial services, and more. It’s likely that you’ve come across implementations of predictive analytics and modeling in your daily life and not even realized it.

Predictive Modeling in the Real World

Returning to our example, say that an email in your inbox reminds you that you wanted to buy a new whisk to make scrambled eggs this weekend. When you head to Amazon.com to make a purchase, you see some recommendations for items you might like on the home page. This component is what’s known in the data science world as a recommender system.

What Amazon's recommender system thinks your kitchen is missing.
What Amazon’s recommender system thinks your kitchen is missing.

To develop this, Amazon uses its vast data sets that detail what people are buying. Then, a machine learning engineer may use Python or R to pass this data through a k-means clustering algorithm. This will organize items into groups that are purchased together and allows Amazon to compare the results with what you’ve already bought to come up with recommendations. With this implementation, Amazon is looking at a combination of what you and others have purchased and/or viewed (current information) and using predictive modeling to anticipate what else you might like based on that data. This is a tremendously powerful tool! It helps a user find what they want faster, get new ideas, while also boosting Amazon sales as it shortens the path to purchase.

Say that, around lunch time, you decide to order pizza delivery — 20 minutes later, there it is. Wow! How did it get there so fast? Using another predictive analysis technique called clustering, the restaurant has analyzed where its orders are coming from and grouped them accordingly. For this project, a data analyst might have run a SQL query to find out which deliveries would take the longest. The analyst might then use a nearest neighbors algorithm in Python to find the optimal groupings and recommend placements for new restaurant locations at cross streets to minimize the distance to the orders.

Clustering for optimal pizza delivery.
Clustering for optimal pizza delivery.

Here, predictive modeling not only saves the company money on driving time and gas, it also cuts down the time between the customer and a hot pizza.

Predictive Modeling at General Assembly

Regardless of the industry, there’s growing opportunity to leverage predictive modeling to solve problems of all sizes. This is rapidly becoming a must-have skill, which is why we teach these techniques and more in our part-time and  full-time data science courses at General Assembly. Starting with simple analyses like linear regression and classification, students use tools like Python and SQL to work with real-world data, building the necessary skills to move on to more involved analyses like time series, clustering, and recommender systems. This gives them the toolbox they need to make data-driven decisions that influence change in the business, government, and nonprofit sectors — and beyond.

Ask a Question About Our Data Programs

Meet Our Expert

Amer Tadmori is a senior statistician at Wiland, where he uses data science to provide business intelligence and data-driven marketing solutions to clients. His passion for turning complex topics into easy-to-understand concepts is what led him to begin teaching. At GA’s Denver campus, Amer leads courses in SQLdata analytics, data visualization, and storytelling with data. He holds a bachelor’s degree in economics from Colgate University and a master’s degree in applied statistics from Colorado State University. In his free time, Amer loves hiking his way through the national parks and snowboarding down Colorado’s local hills.

“Now’s a great time to learn data analysis techniques. There’s an abundance of resources available to learn these skills, and an even greater abundance of places to use them.”

– Amer Tadmori, Data Analytics Instructor, General Assembly Denver

Machine Learning for Data-Driven Predictions and Problem Solving

By

Ever wonder how apps, websites, and machines seem to be able to predict the future? Like how Amazon knows what your next purchase may be, or how self-driving cars can safely navigate a complex road situation?

The answer lies in machine learning.

Machine learning is a branch of artificial intelligence (AI) that concentrates on building systems that can learn from and make decisions based on data. Instead of explicitly programming the machine to solve the problem, we show it how it was solved in the past and the machine learns the key steps that are required to do the same task on its own from the examples.

Think about how Netflix makes movie recommendations. The recommendation engine peeks at the movies you’ve viewed/rated in the past. It then starts to learn the factors that influence your movie preferences and stores them in a database. It could be as simple as noting that you prefer to watch “comedy movies released after 2005 featuring Adam Sandler.” It then starts recommending similar movies that you haven’t watched — all without writing any explicit rules!

This is the power of machine learning.

Machine learning is revolutionizing every industry by bringing greater value to companies’ years of saved data. Leveraging machine learning enables organizations to make more precise decisions instead of following intuition. Companies have begun to embrace the power of machine learning and revise their strategies in order to remain more competitive.

Data Scientists: The Forces Behind Machine Learning

Machine learning is typically practiced by data scientists, who help organizations discover hidden value from their data — thereby enabling them to make smarter business decisions. For instance, insurers use machine learning to make accurate predictions on fraudulent claims, rather than relying on traditional analysis or human judgement. This has a significant impact that can result in lower costs and higher revenue for businesses. Data scientists work with various stakeholders in a company, like business users or product owners, to discover problems and gather data that will be used to solve them.

Data scientists collect, process, clean up, and verify the integrity of data. They apply their engineering, modeling, and statistical skills to build end-to-end machine learning systems. They constantly monitor the performance of those systems and make improvements wherever possible. Often, they need to communicate to non-technical audiences — including stakeholders across the company — in a compelling way to highlight the business impact and opportunity. At the end of the day, those stakeholders have to act on and possibly make far-reaching decisions based on the data scientist’s’ findings.

Above all, data scientists need to be creative and avid problem-solvers. Possessing this combination of skills makes them a rare breed — so it’s no wonder they’re highly sought after by companies across many industries, such as health care, retail, manufacturing, and technology.

Supervised Learning

Machine learning algorithms fall into two categories, supervised and unsupervised learning. Supervised learning tries to predict a future value by relying on training from past data. For instance, Netflix’s movie-recommendation engine is most likely supervised. It uses a user’s past movie ratings as training data to the model and then predicts your rating for unseen movies. Supervised learning enjoys more commercial success than unsupervised learning. Some of the popular use cases include fraud detection, image recognition, credit scoring, product recommendation, and malfunction prediction.

Unsupervised Learning

Unsupervised learning is not about prediction but rather about uncovering hidden structures from the data. It’s helpful in identifying segments or groups, especially when there is no prior information available about those segments. These algorithms are commonly used in market segmentation. They enable marketers to identify target segments in order to maximize revenue, create anomaly detection systems to identify suspicious user behavior, and more.

For instance, Netflix may know how many customers it has, but wants to understand what kind of groupings they fall into in order to offer services targeted to them. The streaming service may have 50 or more different customer types, aka segments, but its data scientists don’t know yet.

If the company knows that most of its customers are in the “families with children” segment, it can invest in building specific programs to meet customer needs. But without that information, Netflix’s data scientists can’t build a supervised machine learning system. So, they build an unsupervised machine learning algorithm instead, which identifies and extracts various customer segments within the data and allows them to identify groups such as “families with children” or “working professionals.”

Machine Learning at General Assembly

At General Assembly, our Data Science Immersive program trains students in machine learning, programming, data visualization, and other skills needed to become a job-ready data scientist. Students learn the hands-on languages and techniques, like SQLPython, and UNIX, that are needed to gather and organize data, build predictive models, create data visualizations, and tackle real-world projects. In class, students work on data science labs, compete on the data science platform Kaggle, and complete a capstone project to showcase their data science skills. They also gain access to career coaching, job-readiness training, and networking opportunities.

If you’re looking to learn during evenings and weekends, you can explore our part-time Data Science course, or visit one of GA’s worldwide campuses for a short-form event or workshop led by local professionals in the field.

Ask a Question About Our Data Programs

Meet Our Expert

Kirubakumaresh Rajendran is an experienced data scientist who’s passionate about applying machine learning and statistical modeling techniques to the domain of business problems. He has worked with IBM and Morgan Stanley to build data-driven products that leverage machine learning techniques. He is a co-instructor for the Data Science Immersive course at GA’s Sydney campus, and enjoys teaching, mentoring, and guiding aspiring data scientists.

“Machines are helping humans build self-driving cars, cancer detection, and more, making it the right time to roll up your sleeves, get into the world of machine learning, and teach machines to make the world a better place.”

– Kirubakumaresh Rajendran, Data Science Immersive Instructor, GA Sydney

Python: The Programming Language Everyone Needs to Learn

By

What’s one thing that Bill Gates, Mark Zuckerberg, Sheryl Sandberg, will.i.am, Chris Bosh, Karlie Kloss, and I, a data science instructor at General Assembly, all have in common? We all think you should learn how to code.

There are countless reasons to learn how to code, even if you don’t want to become a full-time programmer:

  • Programming teaches you amazing problem-solving skills.
  • You’ll be better able to collaborate with engineers and developers if you can “speak their language.”
  • It enables you to help build the technologies of the future, including web applications, machine learning models, chatbots, and anything else you can imagine.

To most people, learning to program — or even choosing what language to learn — seems daunting. I’ll make it simple: Python is an excellent place to start.

Python is an immensely popular programming language commonly used by data analystsdata scientists, and software engineers. In addition to being one of the most popular — it’s used by companies like Google, SpaceX, and Instagram to do a huge variety different things including data cleaning, build AI models, building web apps, and more — Python stands out for being very simple to read and write, while offering extreme flexibility and having an active community.

Here’s a cool example of just how simple Python is: Here is code that tells the computer to print the words “Hello World”:

In Python:

print ("Hello World")

Yup, that’s really all it takes! For context, let’s compare that to another popular programming language, Java, which has a steeper learning curve (though is still a highly desirable skill set in the job market).

public class HelloWorld {   public static void main(String[] args) {      System.out.println("Hello, World");   } }

Clearly, Python requires much less code.

Experiencing Python in Everyday Life

Let’s talk about some of the ways in which Python is used today, including automating a process, building the functionality of an application, or delving into machine learning.

Here are some fascinating examples of how Python is shaping the world we live in:

  • Hollywood special effects: Remember that summer blockbuster with the huge explosions? A lot of companies, including Lucasfilm’s Industrial Light & Magic (ILM), use Python to help program those awesome special effects. By using Python, companies like ILM have been able to develop standard toolkits that they can reuse across productions, while still retaining the flexibility to build custom effects in less time than ever before.
  • File-sharing applications: When Dropbox was created in 2007, it used Python to build the desktop applications and server infrastructure responsible for actually sharing the files. After more than a decade, Python is still powering the company’s desktop applications. In other words, Dropbox was able to write a single application for both Macs and PCs that still works after more than a decade!
  • Web applications: Python is used to run various parts of some of today’s most popular websites, including Pinterest, Instagram, Spotify, and YouTube. In fact, Pinterest has used Python in some form since it was founded (e.g., to power its web app, build and maintain data pipelines, and perform analyses).
  • Artificial intelligence: Python is especially popular in the artificial intelligence community, again for its ease of use and flexibility. For example, in just a few hours, a business could build a basic chatbot that answers some of the most common questions from its customers. To do this, programmers could use Python to scrape the contents of all of the email exchanges with the company’s customers, identify common themes in these exchanges with visualizations, and then build a predictive model that can be used by the chatbot application to give appropriate responses.

Python at General Assembly

General Assembly focuses on building practical experience when learning new technical skills. We want students to walk away from our data science courses and bootcamps equipped to tackle the challenges they’re facing in their own lives and careers.

Python at General Assembly section, change the second graf to:

Many of our courses are designed to teach folks with limited exposure to Python to use it to answer real business questions. Dive into fundamental concepts and techniques, and build your own custom web or data application in our part-time Python Programming course. Or learn to leverage the language as part of our full-time Data Science Immersive program, part-time Data Science course, or a one-day Python bootcamp. Projects students have tackled include visualizing SAT scores from across the country, scraping data from public websites, identifying causes of airplane delays, and predicting Netflix ratings based on viewer sentiment and information from IMDB.

Ask a Question About Our Coding Programs

Meet Our Expert

Michael Larner is a passionate leader in the analytics space who specializes in using techniques like predictive modeling and machine learning to deliver data-driven impact. A Los Angeles native, he has spent the last decade consulting with hundreds of clients, including 50-plus Fortune 500 companies, to answer some of their most challenging business questions. Additionally, Michael empowers others to become successful analysts by leading trainings and workshops for corporate clients and universities, including General Assembly’s part-time Data Analytics course and SQL/Excel workshops in Los Angeles.

“GA provides an amazing community of colleagues, peers, and fellow learners that serve as a wonderful resource as you continue to build your career. GA exposes students to real-world analyses to gain practical experience.”

Michael Larner, Data Analytics Instructor, General Assembly Los Angeles

SQL: Using Data Science to Boost Business and Increase Efficiency

By

In today’s digital age, we’re constantly bombarded with information about new apps, transformative technologies, and the latest and greatest artificial intelligence system. While these technologies may serve very different purposes in our life, all of them share one thing in common: They rely on data. More specifically, they all use databases to capture, store, retrieve, and aggregate data. This begs the question: How do we actually interact with databases to accomplish all of this? The answer: We use Structured Query Language, or SQL (pronounced “sequel” or “ess-que-el”).

Put simply, SQL is the language of data — it’s a programming language that enables us to efficiently create, alter, request, and aggregate data from those mysterious things called databases. It gives us the ability to make connections between different pieces of information, even when we’re dealing with huge data sets. Modern applications are able to use SQL to deliver really valuable pieces of information that would otherwise be difficult for humans to keep track of independently. In fact, pretty much every app that stores any sort of information uses a database. This ubiquity means that developers use SQL to log, record, alter, and present data within the application, while analysts use SQL to interrogate that same data set in order to find deeper insights.

Finding SQL in Everyday Life

Think about the last time you looked up the name of a movie on IMDB. I’ll bet you quickly noticed an actress on the cast list and thought something like, “I didn’t realize she was in that,” then clicked a link to read her bio. As you were navigating through that app, SQL was responsible for returning the information you “requested” each time you clicked a link. This sort of capability is something we’ve come to take for granted these days.

Let’s look at another example that truly is cutting-edge, this time at the intersection of local government and small business. Many metropolitan cities are supporting open data initiatives in which public data is made easily accessible through access to the databases that store this information. As an example, let’s look at Los Angeles building permit data, business listings, and census data.

Imagine you work at a real estate investment firm and are trying to find the next up-and-coming neighborhood. You could use SQL to combine the permit, business, and census data in order to identify areas that are undergoing a lot of construction, have high populations, and contain a relatively low number of businesses. This might be a great opportunity to purchase property in a soon-to-be thriving neighborhood! For the first time in history, it’s easy for a small business to leverage quantitative data from the government in order to make a highly informed business decision.

Leveraging SQL to Boost Your Business and Career

There are many ways to harness SQL’s power to supercharge your business and career, in marketing and sales roles, and beyond. Here are just a few:

  • Increase sales: A sales manager could use SQL to compare the performance of various lead-generation programs and double down on those that are working.
  • Track ads: A marketing manager responsible for understanding the efficacy of an ad campaign could use SQL to compare the increase in sales before and after running the ad.
  • Streamline processes: A business manager could use SQL to compare the resources used by various departments in order to determine which are operating efficiently.

SQL at General Assembly

At General Assembly, we know businesses are striving to transform their data from raw facts into actionable insights. The primary goal of our data analytics curriculum, from workshops to full-time courses, is to empower people to access this data in order to answer their own business questions in ways that were never possible before.

To accomplish this, we give students the opportunity to use SQL to explore real-world data such as Firefox usage statistics, Iowa liquor sales, or Zillow’s real estate prices. Our full-time Data Science Immersive and part-time Data Analytics courses help students build the analytical skills needed to turn the results of those queries into clear and effective business recommendations. On a more introductory level, after just a couple of hours of in one of our SQL workshops, students are able to query multiple data sets with millions of rows.

Ask a Question About Our Data Programs

Meet Our Expert

Michael Larner is a passionate leader in the analytics space who specializes in using techniques like predictive modeling and machine learning to deliver data-driven impact. A Los Angeles native, he has spent the last decade consulting with hundreds of clients, including 50-plus Fortune 500 companies, to answer some of their most challenging business questions. Additionally, Michael empowers others to become successful analysts by leading trainings and workshops for corporate clients and universities, including General Assembly’s part-time Data Analytics course and SQL/Excel workshops in Los Angeles.

“In today’s fast-paced, technology-driven world, data has never been more accessible. That makes it the perfect time — and incredibly important — to be a great data analyst.”

– Michael Larner, Data Analytics Instructor, General Assembly Los Angeles

Using Apache Spark For High Speed, Large Scale Data Processing

By

Apache Spark is an open-source framework used for large-scale data processing. The framework is made up of many components, including four programming APIs and four major libraries. Since Spark’s release in 2014, it has become one of Apache’s fastest growing and most widely used projects of all time.

Spark uses an in-memory processing paradigm to speed up computation and run programs 10 to 100 times faster than other big data technologies like Hadoop MapReduce. According to the 2016 Apache Spark Survey, more than 900 companies, including IBM, Google, Netflix, Amazon, Microsoft, Intel, and Yahoo, use Spark in production for data processing and querying.

Apache Spark is important to the big data field because it represents the next generation of big data processing engines and is a natural successor to MapReduce. One of Spark’s advantages is that its use of four programming APIs — Scala, Python, R, and Java 8 — allows the user flexibility to work in the language of their choice. This makes the tool much more accessible to a wide range of programmers with different capabilities. Spark also has great flexibility in its ability to read all types of data from various locations such as Hadoop Distributed File Storage (HDFS), Amazon’s web-based Simple Storage Service (S3), or even the local filesystem.

Production-Ready and Scalable

Spark’s greatest advantage is that it maximizes the capabilities of data science’s most expensive resource: the data scientist. Computers and programs have become so fast, that we are no longer limited by what they can do as much as we are limited by human productivity. By providing a flexible language platform and having concise syntax, the data scientist can write more programs, iterate through their programs, and have them run much quicker. The code is production-ready and scalable, so there’s no need to hand off code requirements to a development team for changes.

It takes only a few minutes to write a word-count program in Spark, but would take much longer to write the same program in Java. Because the Spark code is so much shorter, there’s less of a need to debug or use version control tools.

Spark’s concise syntax can best be illustrated with the following examples. The Spark code is only four lines compared with almost 58 for Java.

Java vs. Spark

Faster Processing

Spark utilizes in-memory processing to speed up applications. The older big data frameworks, such as Hadoop, use many intermediate disc reads and writes to accomplish the same task. For small jobs on several gigabytes of data, this difference is not as pronounced, but for machine learning applications and more complex tasks such as natural language processing, the difference can be tremendous. Logistic regression, a technique taught in all of General Assembly’s full- and part-time data science courses, can be sped up over 100x.

Spark has four key libraries that also make it much more accessible and provide a wider set of tools for people to use. Spark SQL is ideal for leveraging SQL skills or work with data frames; Spark Streaming has functions for data processing, useful if you need to process data in near real time; and GraphX has pre-written algorithms that are useful if you have graph data or need to do graph processing. The library most useful to students in our Data Science Immersive, though, is the Spark MLlib machine learning library, which has prewritten distributed machine learning algorithms for use on data frames.

Spark at General Assembly

At GA, we teach both the concepts and the tools of data science. Because hiring managers from marketing, technology, and biotech companies, as well as guest speakers like company founders and entrepreneurs, regularly talk about using Spark, we’ve incorporated it into the curriculum to ensure students are fluent in the field’s most relevant skills. I teach Spark as part of our Data Science Immersive (DSI) course in Boston, and I previously taught two Spark courses for Cloudera and IBM. Spark is a great tool to teach because the general curriculum focuses mostly on Python, and Spark has a Python API/library called PySpark.

When we teach Spark in DSI, we cover resilient distributed data sets, directed acyclic graphs, closures, lazy execution, and reading JavaScript Object Notation (JSON), a common big data file format.

Ask a Question About Our Data Programs

Meet Our Expert

Joseph Kambourakis has over 10 years of teaching experience and over five years of experience teaching data science and analytics. He has taught in more than a dozen countries and has been featured in Japanese and Saudi Arabian press. He holds a bachelor’s degree in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He is a passionate Arsenal FC supporter and competitive Magic: The Gathering player. He currently lives with his wife and daughter in Needham, Massachusetts.

“GA students come to class motivated to learn. Throughout the Data Science Immersive course, I keep them on their path by being patient and setting up ideas in a simple way, then letting them learn from hands-on lab work.”

Joseph Kambourakis, Data Science Instructor, General Assembly Boston

Why the Most Expensive Player in Football Doesn’t Matter

By

Twenty-four percent of all NFL games are decided by three-points or less. If that happens this weekend at the 51st Super Bowl, all the glory (or the blame) will fall on Matt Bryant (placekicker, Atlanta Falcons) or Stephen Gostkowski (placekicker, New England Patriots). It seems reasonable to give them the credit, but in this case reason has it wrong. Giving Bryan or Gostkowski the MVP for making a crucial kick is like giving a gambler credit for the roulette wheel landing on red.In American football the team is generally a single unit, but the kicker is a unique position. Quarterbacks are the de facto leaders of the team, but a quarterback is only as good as his offensive line, receivers, and running backs. Unlike baseball or even basketball, measuring the performance of an individual player in football is notoriously difficult. Unless that player is the kicker. In that case, it’s easy. Continue reading

Data at Work: 3 Real-World Problems Solved by Data Science

By

BreakintoDataScienceAt first glance, data science seems to be just another business buzzword — something abstract and ill-defined. While data can, in fact, be both of these things, it’s anything but a buzzword. Data science and its applications have been steadily changing the way we do business and live our day-to-day lives — and considering that 90% of all of the world’s data has been created in the past few years, there’s a lot of growth ahead of this exciting field.

While traditional statistics and data analysis have always focused on using data to explain and predict, data science takes this further and uses data to learn — constructing algorithms and programs that collect from various sources and apply hybrids of mathematical and computer science methods to derive deeper insights. Whereas traditional analysis uses structured data sets, data science dares to ask further questions, looking at unstructured “big data” derived from millions of sources as well as nontraditional mediums such as text, video, and images.

So how is this all manifesting in the market? Here, we take a look at three real-world examples of how data science is driving business innovation across a wide range of industries.

Continue reading