A Beginner’s Guide to Ridgeline Plots

Data

By Joshua Yazman

Organizations conduct survey research for any number of reasons: to decide which products to devote resources to, determine customer satisfaction, figure out who our next president will be, or determine which Game of Thrones characters are most attractive. But almost all surveys are conducted with samples of the target population and therefore are subject to sampling error.

Decision-makers need to understand this error to make the most of survey results, so it's important for data scientists and analysts to communicate confidence intervals when visualizing estimated results. Confidence intervals are the range of values you could reasonably expect to see in your target population based on the results measured in your sample.

But traditional visuals (error bars) can lead to misperceptions, too. In situations where confidence intervals overlap by a small amount, we know there is really small chance of two values being equal — but overlapping error bars on a chart still signal danger. Ridgeline plots, which are essentially a series of density plots (or smoothed-out histograms), can help balance the need to communicate risk without overemphasizing error in situations where error bars only slightly overlap. Instead of showing an error bar, which is the same size from top to bottom, a ridgeline plot gets fatter to represent more likely values and thinner to represent less likely values. This way, a small amount of overlap doesn’t signal lack of statistical significance quite as loudly.

Calculating Confidence Intervals: Planning a Class

Consider, for example, an education startup that conducted a survey of 500 people on its email list to determine which of three classes respondents might want to enroll in. (For demonstration purposes, we’re assuming this is a random sample that’s representative of the target audience.) The options are Hackysack Maintenance, Underwater Basketweaving, and Finger Painting. Results are reported below:

Classes Results (%)
Hackysack Maintenance 24
Underwater Basketweaving 44
Finger painting 32


We could produce a bar plot of this result that makes Underwater Basketweaving appear to be the clear-cut winner.

Basketweaving Graph

But since this data comes from a representative sample, there is some margin of error for each of these point estimates. This post won't go into calculating these confidence intervals except to say that we used the normal approximation method to calculate binomial confidence intervals for each of the three survey results at a 99.7% confidence level. Now our results look more like this:

 

Classes Results (%) Lower Conf. Int. (%) Upper Conf. Int. (%)
Hackysack Maintenance 24 18 30
Underwater Basketweaving 44 37 51
Finger painting 32 26 38

 

One common way to present these confidence intervals is by adding error bars to the plot. When we add these error bars, our plot looks like this:

Basketweaving Error Bars

Unfortunately, our error bars are now overlapping between Finger Painting and Underwater Basketweaving. This means there is some chance that the two courses are equally desirable — or that Finger Painting is actually the most desirable course of all! Decision-makers no longer have a clear-cut investment since the top two responses could be tied.

However, those error bars barely overlap. There's a strong probability that Underwater Basketweaving really is the winner. The problem with this method of plotting error bars is that the visual treats every part of our confidence interval distribution as equally likely instead of the bell curve it should look like.

Enter the ridgeline plot.

What Is a Ridgeline Plot?

Ridgeline plots essentially stack density plots for multiple categorical variables on top of one another. Claus Wilke created ridgeline plots — originally named joy plots — in the summer of 2017, and the visual has rapidly gained popularity among users of the R programming language. They’ve been used to show the changing polarization of political parties, salary distributions, and patterns of breaking news.

By using a ridgeline plot rather than a bar plot, we can present our confidence intervals as the bell curves they are, rather than a flat line. Instead of a bar that implies a clear winner and some error bars that contradict that narrative, the ridgeline plot demonstrates that, indeed, the bulk of possible values for each class are basically different from one another. In the process, the ridgeline plot downplays the small amount of overlap between Finger Painting and Underwater Basketweaving.

Basketweaving Ridgeline Plot

By plotting only the confidence intervals in the form of individual density plots, the ridgeline plot demonstrates the small amount of risk that students really prefer a class on finger painting  without overemphasizing the magnitude of that risk. Our education startup can invest in curriculum development and promotion of the Underwater Basketweaving class with a strong degree of confidence that most of its potential students would be most interested in such a class.

Ridgeline Plots at General Assembly

In General Assembly’s full-time, career-changing Data Science Immersive program and part-time Data Science course, students learn about sampling, calculating confidence intervals, and using data visualizations to help make actionable decisions with data. Students can also learn about the programming language R and other key data skills through expert-led workshops and exclusive industry events across GA’s campuses.

Meet Our Expert

Josh Yazman is a General Assembly Data Analytics alum and a data analyst with expertise in media analytics, survey research, and civic engagement. He now teaches GA’s part-time Data Analytics course in Washington, D.C. Josh spent five years working in Virginia for political candidates at all levels of government, from Blacksburg town council to president. Today, he is a data analyst with a national current-affairs magazine in Washington, D.C., a student at Northwestern University pursuing a master’s degree in predictive analytics, and the advocacy chair for the National Capital Area chapter of the Pancreatic Cancer Action Network. He occasionally writes about political and sports data on Medium and tweets at @jyazman2012.

 

 

“Data science as a field is in demand today — but the decision-making and problem-solving skills you’ll learn from studying it are broadly applicable and valuable in any field or industry.”

Josh Yazman, Data Analytics Instructor, General Assembly Washington, D.C.