About this workshop

This workshop will introduce students to data exploration and machine learning techniques. Students will learn about the data science workflow and will practice exploring and visualizing data using R and built in libraries. Students will also explore the differences between supervised and unsupervised learning techniques and practice creating predictive regression models.

A background in computer science, programming, and/or statistics is preferred for this workshop. It is not required but you are expected to be somewhat familiar with the command line tools and how to write simple programs.


PART I: Data Exploration

-Understand course contents and structure -Describe the data mining workflow and the key traits of a successful data scientist. -Extract, format, and preprocess data using UNIX command-line tools. -Explore and visualize data using R and ggplot2.

PART II: Intro to Machine Learning

-Explain the concepts and applications of supervised & unsupervised learning techniques. -Describe categorical and continuous feature spaces, including examples and techniques for each. -Discuss the purpose of machine learning and the interpretation of predictive modeling results.

Prereqs & Preparation

See pre-work document:

