Data Category Archives - General Assembly Blog | Page 4

How Long It Takes to Learn Python

By

Python, an essential programming language, has taken the programming world by storm. Much of this attention has followed from the interest in machine learning and AI. Python has become the default programming language of Data Scientists and Machine Learning Engineers all over the world. Python’s versatility has also gained a loyal following not just in computer science but also amongst diverse fields like Bioinformatics, Astronomy, Gaming, and of course, Data Science.

But utility alone doesn’t explain why so many developers love using Python. From its humble beginnings in 1991, Python was designed by Guido van Rossum to be a programming language that emphasized code readability. Or in Guido’s words, “Computer Programming for Everybody.” This ease of human interpretability pairs with an open source ethos that makes it available to developers everywhere for free! So with a few short lines of code you can import packages and libraries that professional developers from companies like Facebook, Google, or AirBnB have spent thousands of hours building _(for free)_.

It’s low-entry cost and ease of reading programs has rightfully garnered Python an immense and passionate following.

1. Where there is talent, there is an opportunity, especially, in Python programming.

Python has rapidly become a deep learning skill that is in high demand within the job market. Jobs sites like Dice and Glassdoor have seen near-exponential growth in postings looking for candidates with Python skills over the last few years because making pivot tables and wrangling data in spreadsheets is no longer enough to get you noticed for data analyst positions. As the variety, velocity, and volume of data has exploded, developers have had to scale their analysis pipelines to match — this means that the people pouring over those numbers must develop a deeper skill set to deal with the enormous amounts of data piling up in their databases.

2. Speed and flexibility are the names of the game!

Python is ideal for handling the heavy-lifting required for today’s computationally intense data analyses used by most businesses today.

OK, so now that you’re sold on its value, how long does it take to learn Python? Like any language, practice and muscle memory are the name of the programming language game. The more time you can immerse yourself, the quicker you will see gains.

It also depends on how much you intend to learn during this process. You can figure out elementary Python and have a simple “Hello World” program running in a matter of minutes, i.e., _Seriously; it is only one line of code!_, etc. To get an understanding of deep learning, a subset of machine learning, or data scientist techniques may take months of focused study, pushing past basic concepts. But, to get your foot in the door as a Data Analyst, it takes about 40–50 hours of studying and practicing — in my computer programming experience.

Some of the rudimentary skills from loading required packages, the underlying data structures, and some simple data manipulation take some effort to put into practice. Remember that learning anything takes motivation and attention, especially when learning a new programming language. With our focus being pulled in many directions at once, sometimes having some guided learning  as a programmer can be a huge help — especially with data analysis and data analysts.

How often have you had a problem you spent hours trying to solve by Googling every corner of the internet, only to have the solution explained to you in three seconds by an expert? You can have industry professionals help guide you through this exciting learning adventure to help make sure you are spending your effort in the right places rather than sift through all the YouTube videos, blogs, or StackOverflow posts.

3. General Assembly Python programming FTW!

Often you get back what you put in. So if you are thinking about getting started on your programming language journey of learning Python, General Assembly has several great ways to get you started.

Free Introduction to Python workshops are held regularly. The aim here is to get you set up to start learning and developing in a couple of hours.

There is a 10-week part-time Python course that give you all the programming language skills you need to start a new career as a Data Analyst or Python Developer for those that are ready for more structured and in-depth learning. These classes are held for two hours, twice a week, over 10 weeks.

For those who like to jump in and learn as much as possible in concentrated, full-time sessions every day, General Assembly offers a 13-week Data Science Immersive as well, which covers all the essentials of putting Python programming into good use for Machine Learning and Data Science.

4. Dive into Python programming + a Python course.

If you are on the fence about learning the programming language Python, I strongly suggest you dive in and don’t look back! I have found the transition from being a Data Analyst in a cancer research lab to becoming a Data Scientist at an InsureTech company, one of the best experiences of my life. All the nerdy things I loved, i.e.,  _(computers, stats, data visualization)_, all banded together in an amazing career path

How long does it take to learn the Python programming language? The answer is your learning path up to YOU. 

Are you ready to start your next chapter and boost your coding skills as a python programmer?

Explore Our Upcoming Coding Programs

SQL: Using Data to Boost Business and Increase Efficiency

By

In today’s digital age, we’re constantly bombarded with information about new apps, transformative technologies, and the latest and greatest artificial intelligence system. While these technologies may serve very different purposes in our life, all of them share one thing in common: They rely on data. More specifically, they all use databases to capture, store, retrieve, and aggregate data. This begs the question: How do we actually interact with databases to accomplish all of this? The answer: We use Structured Query Language, or SQL (pronounced “sequel” or “ess-que-el”).

Put simply, SQL is the language of data — it’s a programming language that enables us to efficiently create, alter, request, and aggregate data from those mysterious things called databases. It gives us the ability to make connections between different pieces of information, even when we’re dealing with huge data sets. Modern applications are able to use SQL to deliver really valuable pieces of information that would otherwise be difficult for humans to keep track of independently. In fact, pretty much every app that stores any sort of information uses a database. This ubiquity means that developers use SQL to log, record, alter, and present data within the application, while analysts use SQL to interrogate that same data set in order to find deeper insights.

Finding SQL in Everyday Life

Think about the last time you looked up the name of a movie on IMDB. I’ll bet you quickly noticed an actress on the cast list and thought something like, “I didn’t realize she was in that,” then clicked a link to read her bio. As you were navigating through that app, SQL was responsible for returning the information you “requested” each time you clicked a link. This sort of capability is something we’ve come to take for granted these days.

Let’s look at another example that truly is cutting-edge, this time at the intersection of local government and small business. Many metropolitan cities are supporting open data initiatives in which public data is made easily accessible through access to the databases that store this information. As an example, let’s look at Los Angeles building permit data, business listings, and census data.

Imagine you work at a real estate investment firm and are trying to find the next up-and-coming neighborhood. You could use SQL to combine the permit, business, and census data in order to identify areas that are undergoing a lot of construction, have high populations, and contain a relatively low number of businesses. This might be a great opportunity to purchase property in a soon-to-be thriving neighborhood! For the first time in history, it’s easy for a small business to leverage quantitative data from the government in order to make a highly informed business decision.

Leveraging SQL to Boost Your Business and Career

There are many ways to harness SQL’s power to supercharge your business and career, in marketing and sales roles, and beyond. Here are just a few:

  • Increase sales: A sales manager could use SQL to compare the performance of various lead-generation programs and double down on those that are working.
  • Track ads: A marketing manager responsible for understanding the efficacy of an ad campaign could use SQL to compare the increase in sales before and after running the ad.
  • Streamline processes: A business manager could use SQL to compare the resources used by various departments in order to determine which are operating efficiently.

SQL at General Assembly

At General Assembly, we know businesses are striving to transform their data from raw facts into actionable insights. The primary goal of our data analytics curriculum, from workshops to full-time courses, is to empower people to access this data in order to answer their own business questions in ways that were never possible before.

To accomplish this, we give students the opportunity to use SQL to explore real-world data such as Firefox usage statistics, Iowa liquor sales, or Zillow’s real estate prices. Our full-time Data Science Immersive and part-time Data Analytics courses help students build the analytical skills needed to turn the results of those queries into clear and effective business recommendations. On a more introductory level, after just a couple of hours of in one of our SQL workshops, students are able to query multiple data sets with millions of rows.

Ask a Question About Our Data Programs

Meet Our Expert

Michael Larner is a passionate leader in the analytics space who specializes in using techniques like predictive modeling and machine learning to deliver data-driven impact. A Los Angeles native, he has spent the last decade consulting with hundreds of clients, including 50-plus Fortune 500 companies, to answer some of their most challenging business questions. Additionally, Michael empowers others to become successful analysts by leading trainings and workshops for corporate clients and universities, including General Assembly’s part-time Data Analytics course and SQL/Excel workshops in Los Angeles.

“In today’s fast-paced, technology-driven world, data has never been more accessible. That makes it the perfect time — and incredibly important — to be a great data analyst.”

– Michael Larner, Data Analytics Instructor, General Assembly Los Angeles

Harnessing the Power of Data for Disaster Relief

By

2455_header

Data is the engine driving today’s digital world. From major companies to government agencies to nonprofits, business leaders are hunting for talent that can help them collect, sort, and analyze vast amounts of data — including geodata — to tackle the world’s biggest challenges.

In the case of emergency management, disaster preparedness, response, and recovery, this means using data to expertly identify, manage, and mitigate the risks of destructive hurricanes, intense droughts, raging wildfires, and other severe weather and climate events. And the pressure to make smarter data-driven investments in disaster response planning and education isn’t going away anytime soon — since 1980, the U.S. has suffered 246 weather and climate disasters that topped over $1 billion in losses according to the National Centers for Environmental Information.

Employing creative approaches for tackling these pressing issues is a big reason why New Light Technologies (NLT), a leading company in the geospatial data science space, joined forces with General Assembly’s (GA) Data Science Immersive (DSI) course, a hands-on intensive program that fosters job-ready data scientists. Global Lead Data Science Instructor at GA, Matt Brems, and Chief Scientist and Senior Consultant at NLT, Ran Goldblatt, recognized a unique opportunity to test drive collaboration between DSI students and NLT’s consulting work for the Federal Emergency Management Agency (FEMA) and the World Bank.

The goal for DSI students: build data solutions that address real-world emergency preparedness and disaster response problems using leading data science tools and programming languages that drive visual, statistical, and data analyses. The partnership has so far produced three successful cohorts with nearly 60 groups of students across campuses in Atlanta, Austin, Boston, Chicago, Denver, New York City, San Francisco, Los Angeles, Seattle, and Washington, D.C., who learn and work together through GA’s Connected Classroom experience.

Taking on Big Problems With Smart Data

nlt-ga-2

DSI students present at NLT’s Washington, D.C. office.

“GA is a pioneering institution for data science, so many of its goals coincide with ours. It’s what also made this partnership a unique fit. When real-world problems are brought to an educational setting with students who are energized and eager to solve concrete problems, smart ideas emerge,” says Goldblatt.

Over the past decade, NLT has supported the ongoing operation, management, and modernization of information systems infrastructure for FEMA, providing the agency with support for disaster response planning and decision-making. The World Bank, another NLT client, faces similar obstacles in its efforts to provide funding for emergency prevention and preparedness.

These large-scale issues served as the basis for the problem statements NLT presented to DSI students, who were challenged to use their newfound skills — from developing data algorithms and analytical workflows to employing visualization and reporting tools — to deliver meaningful, real-time insights that FEMA, the World Bank, and similar organizations could deploy to help communities impacted by disasters. Working in groups, students dived into problems that focused on a wide range of scenarios, including:

  • Using tools such as Google Street View to retrieve pre-disaster photos of structures, allowing emergency responders to easily compare pre- and post-disaster aerial views of damaged properties.
  • Optimizing evacuation routes for search and rescue missions using real-time traffic information.
  • Creating damage estimates by pulling property values from real estate websites like Zillow.
  • Extracting drone data to estimate the quality of building rooftops in Saint Lucia.

“It’s clear these students are really dedicated and eager to leverage what they learned to create solutions that can help people. With DSI, they don’t just walk away with an academic paper or fancy presentation. They’re able to demonstrate they’ve developed an application that, with additional development, could possibly become operational,” says Goldblatt.

Students who participated in the engagements received the opportunity to present their work — using their knowledge in artificial intelligence and machine learning to solve important, tangible problems — to an audience that included high-ranking officials from FEMA, the World Bank, and the United States Agency for International Development (USAID). The students’ projects, which are open source, are also publicly available to organizations looking to adapt, scale, and implement these applications for geospatial and disaster response operations.

“In the span of nine weeks, our students grew from learning basic Python to being able to address specific problems in the realm of emergency preparedness and disaster response,” says Brems. “Their ability to apply what they learned so quickly speaks to how well-qualified GA students and graduates are.”

Here’s a closer look at some of those projects, the lessons learned, and students’ reflections on how GA’s collaboration with NLT impacted their DSI experience.

Leveraging Social Media to Map Disasters

2455_sec1_socialmediamap_560x344

The NLT engagements feature student work that uses social media to identify “hot spots” for disaster relief.

During disasters, one of the biggest challenges for disaster relief organizations is not only mapping and alerting users about the severity of disasters but also pinpointing hot spots where people require assistance. While responders employ satellite and aerial imagery, ground surveys, and other hazard data to assess and identify affected areas, communities on the ground often turn to social media platforms to broadcast distress calls and share status updates.

Cameron Bronstein, a former botany and ecology major from New York, worked with group members to build a model that analyzes and classifies social media posts to determine where people need assistance during and after natural disasters. The group collected tweets related to Hurricane Harvey of 2017 and Hurricane Michael of 2018, which inflicted billions of dollars of damage in the Caribbean and Southern U.S., as test cases for their proof-of-concept model.

“Since our group lacked premium access to social media APIs, we sourced previously collected and labeled text-based data,” says Bronstein. “This involved analyzing and classifying several years of text language — including data sets that contained tweets, and transcribed phone calls and voice messages from disaster relief organizations.”

Contemplating on what he enjoyed most while working on the NLT engagement, Bronstein states, “Though this project was ambitious and open to interpretation, overall, it was a good experience and introduction to the type of consulting work I could end up doing in the future.”

Quantifying the Economic Impact of Natural Disasters

2455_sec2_economicimpact_560x344

Students use interactive data visualization tools to compile and display their findings.

Prior to enrolling in General Assembly’s DSI course in Washington D.C., Ashley White learned early in her career as a management consultant how to use data to analyze and assess difficult client problems. “What was central to all of my experiences was utilizing the power of data to make informed strategic decisions,” states White.

It was White’s interest in using data for social impact that led her to enroll in DSI where she could be exposed to real-world applications of data science principles and best practices. Her DSI group’s task: developing a model for quantifying the economic impact of natural disasters on the labor market. The group selected Houston, Texas as its test case for defining and identifying reliable data sources to measure the economic impact of natural disasters such as Hurricane Harvey.

As they tackled their problem statement, the group focused on NLT’s intended goal, while effectively breaking their workflow into smaller, more manageable pieces. “As we worked through the data, we discovered it was hard to identify meaningful long-term trends. As scholarly research shows, most cities are pretty resilient post-disaster, and the labor market bounces back quickly as the city recovers,” says White.

The team compiled their results using the analytics and data visualization tool Tableau, incorporating compelling visuals and story taglines into a streamlined, dynamic interface. For version control, White and her group used GitHub to manage and store their findings, and share recommendations on how NLT could use the group’s methodology to scale their analysis for other geographic locations. In addition to the group’s key findings on employment fluctuations post-disaster, the team concluded that while natural disasters are growing in severity, aggregate trends around unemployment and similar data are becoming less predictable.

Cultivating Data Science Talent in Future Engagements

Due to the success of the partnership’s three engagements, GA and NLT have taken steps to formalize future iterations of their collaboration with each new DSI cohort. Additionally, mutually beneficial partnerships with leading organizations such as NLT present a unique opportunity to uncover innovative approaches for managing and understanding the numerous ways data science can support technological systems and platforms. It’s also granted aspiring data scientists real-world experience and visibility with key decision-makers who are at the forefront of emergency and disaster management.

“This is only the beginning of a more comprehensive collaboration with General Assembly,” states Goldblatt. “By leveraging GA’s innovative data science curriculum and developing training programs for capacity building that can be adopted by NLT clients, we hope to provide students with essential skills that prepare them for the emerging, yet competitive, geospatial data job market. Moreover, students get the opportunity to better understand how theory, data, and algorithms translate to actual tools, as well as create solutions that can potentially save lives.”

***

New Light Technologies, Inc. (NLT) provides comprehensive information technology solutions for clients in government, commercial, and non-profit sectors. NLT specializes in DevOps enterprise-scale systems integration, development, management, and staffing and offers a unique range of capabilities from Infrastructure Modernization and Cloud Computing to Big Data Analytics, Geospatial Information Systems, and the Development of Software and Web-based Visualization Platforms.

In today’s rapidly evolving technological world, successfully developing and deploying digital geospatial software technologies and integrating disparate data across large complex enterprises with diverse user requirements is a challenge. Our innovative solutions for real-time integrated analytics lead the way in developing highly scalable virtualized geospatial microservices solutions. Visit our website to find out more and contact us at https://NewLightTechnologies.com.

A Machine Learning Guide for Beginners

By

Ever wonder how apps, websites, and machines seem to be able to predict the future? Like how Amazon knows what your next purchase may be, or how self-driving cars can safely navigate a complex traffic situation?

The answer lies in machine learning.

Machine learning is a branch of artificial intelligence (AI) that often leverages Python to build systems that can learn from and make decisions based on data. Instead of explicitly programming the machine to solve the problem, we show it how it was solved in the past and the machine learns the key steps that are required to do the same task on its own.

Machine learning is revolutionizing every industry by bringing greater value to companies’ years of saved data. Leveraging machine learning enables organizations to make more precise decisions instead of following intuition.

There’s an explosive amount of innovation around machine learning that’s being used within organizations, especially given that the technology is still in its early days. Many companies have invested heavily in building recommendation and personalization engines for their customers. But, machine learning is also being applied in a huge variety of back-office use cases as well, like to forecast sales, identify production bottlenecks, build efficient traffic routing systems, and more.

Machine learning algorithms fall into two categories: supervised and unsupervised learning.

Supervised Learning

Supervised learning tries to predict a future value by relying on training from past data. For instance, Netflix’s movie-recommendation engine is most likely supervised. It uses a user’s past movie ratings to train the model, then predicts what their rating would likely be for movies they haven’t seen and recommends the ones that score highly.

Supervised learning enjoys more commercial success than unsupervised learning. Some common use cases include fraud detection, image recognition, credit scoring, product recommendation, and malfunction prediction.

Unsupervised Learning

Unsupervised learning is about uncovering hidden structures within data sets. It’s helpful in identifying segments or groups, especially when there is no prior information available about them. These algorithms are commonly used in market segmentation. They enable marketers to identify target segments in order to maximize revenue, create anomaly detection systems to identify suspicious user behavior, and more.

For instance, Netflix may know how many customers it has, but wants to understand what kind of groupings they fall into in order to offer services targeted to them. The streaming service may have 50 or more different customer types, aka, segments, but its data team doesn’t know this yet. If the company knows that most of its customers are in the “families with children” segment, it can invest in building specific programs to meet those customer needs. But, without that information, Netflix’s data experts can’t create a supervised machine learning system.

So, they build an unsupervised machine learning algorithm instead, which identifies and extracts various customer segments within the data and allows them to identify groups such as “families with children” or “working professionals.”

How Python, SQL, and Machine Learning Work Together

To understand how SQLPython, and machine learning relate to one another, let’s think of them as a factory. As a concept, a factory can produce anything if it has the right tools. More often than not, the tools used in factories are pretty similar (e.g., hammers and screwdrivers).

What’s amazing is that there can be factories that use those same tools but produce completely different products (e.g., tables versus chairs). The difference between these factories is not the tools, but rather how the factory workers use their expertise to leverage these tools and produce a different result.

In this case, our goal would be to produce a machine learning model, and our tools would be SQL and Python. We can use SQL to extract data from a database and Python to shape the data and perform the analyses that ultimately produce a machine learning model. Your knowledge of machine learning will ultimately enable you to achieve your goal.

To round out the analogy, an app developer, with no understanding of machine learning, might choose to use SQL and Python to build a web app. Again, the tools are the same, but the practitioner uses their expertise to apply them in a different way.

Machine Learning at Work

A wide variety of roles can benefit from machine learning know-how. Here are just a few:

  • Data scientist or analyst: Data scientists or analysts use machine learning to answer specific business questions for key stakeholders. They might help their company’s user experience (UX) team determine which website features most heavily drive sales.
  • Machine learning engineer: A machine learning engineer is a software engineer specifically responsible for writing code that leverages machine learning models. For example, they might build a recommendation engine that suggests products to customers.
  • Research scientist: A machine learning research scientist develops new technologies like computer vision for self-driving cars or advancements in neural networks. Their findings enable data professionals to deliver new insights and capabilities.

Machine Learning in Everyday Life: Real-World Examples

While machine learning-powered innovations like voice-activated robots seem ultra-futuristic, the technology behind them is actually widely used today. Here are some great examples of how machine learning impacts your daily life:

  • Recommendation engines: Think about how Spotify makes music recommendations. The recommendation engine peeks at the songs and albums you’ve listened to in the past, as well as tracks listened to by users with similar tastes. It then starts to learn the factors that influence your music preferences and stores them in a database, recommending similar music that you haven’t listened to — all without writing any explicit rules!
  • Voice-recognition technology: We’ve seen the emergence of voice assistants like Amazon’s Alexa and Google’s Assistant. These interactive systems are based entirely on voice-recognition technology powered by machine learning models.
  • Risk mitigation and fraud prevention: Insurers and creditors use machine learning to make accurate predictions on fraudulent claims based on previous consumer behavior, rather than relying on traditional analysis or human judgement. They also can use these analyses to identify high-risk customers. Both of these analyses help companies process requests and claims more quickly and at a lower cost.
  • Photo identification via computer vision: Machine learning is common among photo-heavy services like Facebook and the home-improvement site Houzz. Each of these services use computer vision — an aspect of machine learning — to automatically tag objects in photos without human intervention. For Facebook, these tend to be faces, whereas Houzz seeks to identify individual objects and link to a place where users can purchase them.

Why You and Your Business Need to Understand Data Science

As the world becomes increasingly data-driven, learning to leverage key technologies like machine learning — along with the programming languages Python (which helps power machine learning algorithms) and SQL — will create endless possibilities for your career and your organization. There are many pathways into this growing field, as detailed by our Data Science Standards Board, and now’s a great time to dive in.

In our paper A Beginner’s Guide to SQL, Python, and Machine Learning, we break down these three data sectors. These skills go beyond data to bring delight, efficiency, and innovation to countless industries. They empower people to drive businesses forward with a speed and precision previously unknown.

Individuals can use data know-how to improve their problem-solving skills, become more cross-functional, build innovative technology, and more. For companies, leveraging these technologies means smarter use of data. This can lead to greater efficiency, employees who are empowered to use data in innovative ways, and business decisions that drive revenue and success.

Download the paper to learn more.

Boost your business and career acumen with data.
Find out why machine learning, Python, and SQL are the top technologies to know.

Download the eBook

The Study of Data Science Lags in Gender and Racial Representation

By

data science gender race disparity

In the past few years, much attention has been drawn to the dearth of women and people of color in tech-related fields. A recent article in Forbes noted, “Women hold only about 26% of data jobs in the United States. There are a few reasons for the gender gap: a lack of STEM education for women early on in life, lack of mentorship for women in data science, and human resources rules and regulations not catching up to gender balance policies, to name a few.” Federal civil rights data further demonstrate that “black and Latino high school students are being shortchanged in their access to high-level math and science courses that could prepare them for college” and for careers in fields like data science.

As an education company offering tech-oriented courses at 20 campuses across the world, General Assembly is in a unique position to analyze the current crop of students looking to change the dynamics of the workplace.

Looking at GA data for our part-time programs (which typically reach students who already have jobs and are looking to expand their skill set as they pursue a promotion or a career shift), here’s what we found: While great strides have been made in fields like web development and user experience (UX) design, data science — a relatively newer concentration — still has a ways to go in terms of gender and racial equality.

Continue reading

Using Apache Spark For High Speed, Large Scale Data Processing

By

Apache Spark is an open-source framework used for large-scale data processing. The framework is made up of many components, including four programming APIs and four major libraries. Since Spark’s release in 2014, it has become one of Apache’s fastest growing and most widely used projects of all time.

Spark uses an in-memory processing paradigm to speed up computation and run programs 10 to 100 times faster than other big data technologies like Hadoop MapReduce. According to the 2016 Apache Spark Survey, more than 900 companies, including IBM, Google, Netflix, Amazon, Microsoft, Intel, and Yahoo, use Spark in production for data processing and querying.

Apache Spark is important to the big data field because it represents the next generation of big data processing engines and is a natural successor to MapReduce. One of Spark’s advantages is that its use of four programming APIs — Scala, Python, R, and Java 8 — allows the user flexibility to work in the language of their choice. This makes the tool much more accessible to a wide range of programmers with different capabilities. Spark also has great flexibility in its ability to read all types of data from various locations such as Hadoop Distributed File Storage (HDFS), Amazon’s web-based Simple Storage Service (S3), or even the local filesystem.

Production-Ready and Scalable

Spark’s greatest advantage is that it maximizes the capabilities of data science’s most expensive resource: the data scientist. Computers and programs have become so fast, that we are no longer limited by what they can do as much as we are limited by human productivity. By providing a flexible language platform and having concise syntax, the data scientist can write more programs, iterate through their programs, and have them run much quicker. The code is production-ready and scalable, so there’s no need to hand off code requirements to a development team for changes.

It takes only a few minutes to write a word-count program in Spark, but would take much longer to write the same program in Java. Because the Spark code is so much shorter, there’s less of a need to debug or use version control tools.

Spark’s concise syntax can best be illustrated with the following examples. The Spark code is only four lines compared with almost 58 for Java.

Java vs. Spark

Faster Processing

Spark utilizes in-memory processing to speed up applications. The older big data frameworks, such as Hadoop, use many intermediate disc reads and writes to accomplish the same task. For small jobs on several gigabytes of data, this difference is not as pronounced, but for machine learning applications and more complex tasks such as natural language processing, the difference can be tremendous. Logistic regression, a technique taught in all of General Assembly’s full- and part-time data science courses, can be sped up over 100x.

Spark has four key libraries that also make it much more accessible and provide a wider set of tools for people to use. Spark SQL is ideal for leveraging SQL skills or work with data frames; Spark Streaming has functions for data processing, useful if you need to process data in near real time; and GraphX has pre-written algorithms that are useful if you have graph data or need to do graph processing. The library most useful to students in our Data Science Immersive, though, is the Spark MLlib machine learning library, which has prewritten distributed machine learning algorithms for use on data frames.

Spark at General Assembly

At GA, we teach both the concepts and the tools of data science. Because hiring managers from marketing, technology, and biotech companies, as well as guest speakers like company founders and entrepreneurs, regularly talk about using Spark, we’ve incorporated it into the curriculum to ensure students are fluent in the field’s most relevant skills. I teach Spark as part of our Data Science Immersive (DSI) course in Boston, and I previously taught two Spark courses for Cloudera and IBM. Spark is a great tool to teach because the general curriculum focuses mostly on Python, and Spark has a Python API/library called PySpark.

When we teach Spark in DSI, we cover resilient distributed data sets, directed acyclic graphs, closures, lazy execution, and reading JavaScript Object Notation (JSON), a common big data file format.

Ask a Question About Our Data Programs

Meet Our Expert

Joseph Kambourakis has over 10 years of teaching experience and over five years of experience teaching data science and analytics. He has taught in more than a dozen countries and has been featured in Japanese and Saudi Arabian press. He holds a bachelor’s degree in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He is a passionate Arsenal FC supporter and competitive Magic: The Gathering player. He currently lives with his wife and daughter in Needham, Massachusetts.

“GA students come to class motivated to learn. Throughout the Data Science Immersive course, I keep them on their path by being patient and setting up ideas in a simple way, then letting them learn from hands-on lab work.”

Joseph Kambourakis, Data Science Instructor, General Assembly Boston

How Data Maps Reveal Inequality and Equity in Atlanta

By

Housing Map of Atlanta provided by Neighborhood Nexus.

Map of Atlanta provided by Neighborhood Nexus.

Mapping the communities of tomorrow requires a hard look at the topographies of today. Mike Carnathan, project director at Neighborhood Nexus, synthesizes big data into visual stories that chart the social, political, and economic conditions across the city of Atlanta. Part data miner, part cultural cartographer, Carnathan creates demographic maps that local leaders, advocates, and everyday citizens use to help understand and change their lives.

Continue reading

Measuring What Matters: General Assembly’s First Student Outcomes Report

By

ga_outcomes-email-blog

Since founding General Assembly in 2011, I’ve heard some incredible stories from our students and graduates. One of my favorites is about Jerome Hardaway. Jerome came to GA after five years in the United States Air Force. He dreamed of tackling persistent diversity gaps in the technology sector by breaking down barriers for other veterans and people of color.

In 2014, with the help of General Assembly’s Opportunity Fund scholarship, Jerome began one of our full-time Web Development Immersive courses. After graduation, he had the opportunity to pitch President Obama at the first-ever White House Demo Day and has launched a nonprofit in Nashville, Vets Who Code, which helps veterans navigate the transition to civilian life through technology skills training.

Exceptional stories like Jerome’s embody GA’s mission of “empowering people to pursue the work they love.” It’s a mission that motivates our instructional designers, faculty, mentors, and career coaches. It also inspired the development of an open source reporting framework which defined GA’s approach to measuring student outcomes and now, our first report with verified student outcomes metrics.

Continue reading

The Skills and Tools Every Data Scientist Must Master

By

women of color in tech

Photo by WOC in Tech.

“Data scientist” is one of today’s hottest jobs.

In fact, Glassdoor calls it the best job of 2017, with a median base salary of $110,000. This fact shouldn’t be big news. In 2011, McKinsey predicted there would be a shortage of 1.5 million managers and analysts “with the know-how to use the analysis of big data to make effective decisions.” Today, there are more than 38,000 data scientist positions listed on Glassdoor.com.

It makes perfect sense that this job is both new and popular, since every move you make online is actively creating data somewhere for something. Someone has to make sense of that data and discover trends in the data to see if the data is useful. That is the job of the data scientist. But how does the data scientist go about the job? Here are the three skills and three tools that every data scientist should master.

Continue reading

Announcing General Assembly’s New Data Science Immersive

By

DataImmersive_EmailArt_560x350_v1

Data science is “one of the hottest and best-paid professions in the U.S. More than ever, companies need analytical minds who can compile data, analyze it, and drive everything from marketing forecasts to product launches with compelling predictions. Their work drives the core strategies of modern business — so much so that, by 2018, data-related job openings will total 1.5 million. That’s why we’ve worked hard to develop classes, workshops, and courses to confront the data science skills gap. The latest addition to our proud family of data education is the new Data Science Immersive program.

Launching for the first time in San Francisco and Washington, D.C. on April 11, this full-time Immersive program will equip you with the tools and techniques you need to become a data pro in just 12 weeks.

Continue reading