A Beginner’s Guide to Predictive Modeling

Predictive Modeling

By Amer Tadmori

You know the scenario: You get to work in the morning and quickly check your personal email. Over on the side, you notice that your spam folder has a couple of items in it, so you look inside. You’re amazed — although some of them look like genuine emails, they’re not; these cleverly disguised ads are all correctly labeled as spam. What you’re seeing is natural language processing (NLP) in action. In this instance, the email service provider is using what’s known as predictive analytics to assess language data and determine which combinations of words are likely spam, filtering your email accordingly.

With the volume of data being created, collected, and stored increasing by the hour, the days of making decisions based solely on intuition are numbered. Companies collect data on their customers, nonprofits collect data on their donors, apps collect data on their users, all with the goal of finding opportunities to improve their products and services. More and more, decision-making is becoming data driven. People use information to understand what’s happening in the world around them and try to predict what will happen in the future. For this, we turn to predictive analytics.

Predictive analytics is the concept of using current information to forecast what will happen next time. This area of study covers a broad range of concepts and skills — oftentimes involving modeling techniques — that help turn data into insights and insights into action. These ideas are already in practice in industries like eCommerce, direct marketing, cybersecurity, financial services, and more. It’s likely that you’ve come across implementations of predictive analytics and modeling in your daily life and not even realized it.

Predictive Modeling in the Real World

Returning to our example, say that an email in your inbox reminds you that you wanted to buy a new whisk to make scrambled eggs this weekend. When you head to Amazon.com to make a purchase, you see some recommendations for items you might like on the home page. This component is what’s known in the data science world as a recommender system.

What Amazon's recommender system thinks your kitchen is missing.
What Amazon's recommender system thinks your kitchen is missing.

To develop this, Amazon uses its vast data sets that detail what people are buying. Then, a machine learning engineer may use Python or R to pass this data through a k-means clustering algorithm. This will organize items into groups that are purchased together and allows Amazon to compare the results with what you’ve already bought to come up with recommendations. With this implementation, Amazon is looking at a combination of what you and others have purchased and/or viewed (current information) and using predictive modeling to anticipate what else you might like based on that data. This is a tremendously powerful tool! It helps a user find what they want faster, get new ideas, while also boosting Amazon sales as it shortens the path to purchase.

Say that, around lunch time, you decide to order pizza delivery — 20 minutes later, there it is. Wow! How did it get there so fast? Using another predictive analysis technique called clustering, the restaurant has analyzed where its orders are coming from and grouped them accordingly. For this project, a data analyst might have run a SQL query to find out which deliveries would take the longest. The analyst might then use a nearest neighbors algorithm in Python to find the optimal groupings and recommend placements for new restaurant locations at cross streets to minimize the distance to the orders.

Clustering for optimal pizza delivery.
Clustering for optimal pizza delivery.

Here, predictive modeling not only saves the company money on driving time and gas, it also cuts down the time between the customer and a hot pizza.

Predictive Modeling at General Assembly

Regardless of the industry, there’s growing opportunity to leverage predictive modeling to solve problems of all sizes. This is rapidly becoming a must-have skill, which is why we teach these techniques and more in our part-time and  full-time data science courses at General Assembly. Starting with simple analyses like linear regression and classification, students use tools like Python and SQL to work with real-world data, building the necessary skills to move on to more involved analyses like time series, clustering, and recommender systems. This gives them the toolbox they need to make data-driven decisions that influence change in the business, government, and nonprofit sectors — and beyond.

Meet Our Expert

Amer Tadmori is a senior statistician at Wiland, where he uses data science to provide business intelligence and data-driven marketing solutions to clients. His passion for turning complex topics into easy-to-understand concepts is what led him to begin teaching. At GA’s Denver campus, Amer leads courses in SQL, data analytics, data visualization, and storytelling with data. He holds a bachelor’s degree in economics from Colgate University and a master's degree in applied statistics from Colorado State University. In his free time, Amer loves hiking his way through the national parks and snowboarding down Colorado’s local hills.

“Now’s a great time to learn data analysis techniques. There’s an abundance of resources available to learn these skills, and an even greater abundance of places to use them.”

Amer Tadmori, Data Analytics Instructor, General Assembly Denver