What Is Data Mining?


data visualizationCC Image Courtesy of Luc Legay on Flickr

If you’re using technology — to shop, communicate, or find information — the results of data mining are all around you. Just think of the influx of recommendations of items to purchase, shows to watch, and friend suggestions on social media. Data mining allows companies to take the giant pile of consumer information generated daily and analyze for relationships and patterns. When it comes to big data, the possibilities for exploration are nearly endless.

How Does Data Mining Work?

First comes data collection — with modern technology, amassing click, purchase and other data is relatively easy. Data mining is the more challenging task of finding patterns, making predictions, and drawing conclusions.

But while the volume of data might be new, the methods for analyzing it rely on old-fashioned statistics: data miners look for correlations and relationships in the data, and build models and algorithms that act predictively. Both free and paid programs are available for data mining.

Data Mining in the World

In an oft-told story, Target had an enraged father come to their store after his teenaged daughter received coupons for baby-related items from the retailer. But as it turned out, the teen was pregnant: Target was able to tell based on her previous purchases.

For a big box store, the benefits of knowing that a purchase of cocoa butter can indicate pregnancy are clear. But it’s also a bit creepy. We unknowingly reveal so much in our online searches and shopping habits, and easily accessible data about where we live and work reveals even more.

Some worry that data mining could cause companies to act in ways that are unethical, or cause unintentional negative side effects. What happens when health insurance companies apply data mining? An article in Wall Street Journal speculates about difficulties that could result, including false correlations (subscribing to Hang Gliders Monthly doesn’t mean a person is engaged in dangerous behavior; donating to a cancer charity does not necessarily mean a family history of cancer) that could result in higher insurance rates.

Providing targeted advertisements to consumers is not new — supermarket coupons and direct mailers have relied on targeting deals for years — but the increased pool of information available makes the results more accurate, in a way that can be concerning.

But the benefits of data mining are also clear: When you’re browsing Netflix, it’s helpful to see targeted recommendations for TV shows, which are developed using information about what other people with similar viewing habits watch next. And from a health and safety perspective, data mining can reveal valuable insights, as when the NYFD crafted an algorithm that assigned a risk level to all high rises in the city, based on factors such as poverty, the building’s age, occupancy, and other indicators that can determine the likelihood of a fire.

A New Hot Job?

Hal Varian, chief economist at Google, has said that “the sexy job in the next 10 years will be statisticians.” And it’s no wonder: Now that we have this information, figuring out how to use it to provide actionable information to governments, companies, and individuals — without causing ethical dilemmas — is the next big challenge.

Want to get in on this hot new field? Check out the data science course, which will show you how to apply and develop math and programming skills to big data.

Take the Data Science Class