# View all Data courses Data Science

### Part-Time Data Course

Talk to Admissions +1 (202) 517-1777

### Skills & Tools

Use Python to mine datasets and predict patterns.

### Production Standard

Build statistical models — regression and classification — that generate usable information from raw data.

### The Big Picture

Master the basics of machine learning and harness the power of data to forecast what’s next.

Our educational excellence is a community effort. When you learn at GA, you can always rely on an in-house team of experts to provide guidance and support, whenever you need it.

• ### Instructors

Learn industry-grade frameworks, tools, vocabulary, and best practices from a teacher whose daily work involves using them expertly.

• ### Teaching Assistants

Taking on new material isn’t always easy. Through office hours and other channels, our TAs are here to provide you with answers, tips, and more.

• ### Course Producers

Our alumni love their Course Producers, who kept them motivated throughout the course. You can reach out to yours for support anytime.

# See What You’ll Learn

### Unit 1: Research Design and Exploratory Data Analysis

#### What is Data Science

• Describe course syllabus and establish the classroom environment
• Answer the questions: "What is Data Science? What roles exist in Data Science?"
• Define the workflow, tools and approaches data scientists use to analyze data

#### Research Design and Pandas

• Define a problem and identify appropriate data sets using the data science workflow
• Walkthrough the data science workflow using a case study in the Pandas library
• Import, format and clean data using the Pandas Library

#### Statistics Fundamental I

• Use NumPy and Pandas libraries to analyze datasets using basic summary statistics: mean, median, mode, max, min, quartile, inter-quartile, range, variance, standard deviation and correlation
• Create data visualization – scatter plots, scatter matrix, line graph, box blots, and histograms – to discern characteristics and trends in a dataset
• Identify a normal distribution within a dataset using summary statistics and visualization

#### Statistics Fundamental II

• Explain the difference between causation vs. correlation
• Test a hypothesis within a sample case study
• Validate your findings using statistical analysis (p-values, confidence intervals)

#### Instructor Choice

• Focus on a topic selected by the instructor/class in order to provide deeper insight into exploratory data analysis

### Unit 2: Foundations of Data Modeling

#### Introduction to Regression

• Define data modeling and linear regression
• Differentiate between categorical and continuous variables
• Build a linear regression model using a dataset that meets the linearity assumption using the scikit-learn library

#### Evaluating Model Fit

• Define regularization, bias, and errors metrics;
• Evaluate model fit by using loss functions including mean absolute error, mean squared error, root mean squared error
• Select regression methods based on fit and complexity

#### Introduction to Classification

• Define a classification model
• Build a K–Nearest Neighbors using the scikit–learn library
• Evaluate and tune model by using metrics such as classification accuracy ⁄ error

#### Introduction to Logistic Regression

• Build a Logistic regression classification model using the scikit learn library
• Describe the sigmoid function, odds, and odds ratios and how they relate to logistic regression
• Evaluate a model using metrics such as classification accuracy ⁄ error, confusion matrix, ROC ⁄ AOC curves, and loss functions

#### Communicate Results from Logistic Regression

• Explain the tradeoff between the precision and recall of a model and articulate the cost of false positives vs. false negatives.
• Identify the components of a concise, convincing report and how they relate to specific audiences ⁄ stakeholders
• Describe the difference between visualization for presentations vs. exploratory data analysis

#### Flexible Class Session

• Focus on a topic selected by the instructor ⁄ class in order to provide deeper insight into data modeling

### Unit 3: Data Science in the Real World

#### Decision Trees and Random Forest

• Describe the difference between classification and regression trees and how to interpret these models
• Explain and communicate the tradeoffs of decision trees vs regression models
• Build decision trees and random forests using the scikit-learn library

#### Natural Language Processing

• Demonstrate how to tokenize natural language text using NLTK
• Categorize and tag unstructured text data
• Explain how to build a text classification model using NLTK

#### Dimensionality Reduction

• Explain how to perform a dimensional reduction using topic models
• Demonstrate how to refine data using latent dirichlet allocation (LDA)
• Extract information from a sample text dataset

#### Working with Time Series Data

• Explain why time series data is different than other data and how to account for it
• Create rolling means and plot time series data using the Pandas library
• Perform autocorrelation on time series data

#### Creating Models with Time Series Data

• Decompose time series data into trend and residual components
• Validate and cross-validate data from different data sets
• Use the ARIMA model to forecast and detect trends in time series data

#### The Value of Databases

• Describe the use cases for different types of databases
• Explain differences between relational databases and document-based databases
• Write simple select queries to pull data from a database and use within Pandas

#### Moving Forward with your Data Science Career

• Specify common models used within different industries
• Identify the use cases for common models
• Discuss next steps and additional resources for data science learning

#### Flexible Class Session

• Focus on a topic selected by the instructor⁄class in order to provide deeper insight into data science in the real world

#### Final Presentations

• Present final presentation to peers, instructor, and guest panelists who will identify strengths and areas for improvement

Request a Detailed Syllabus

My team at Amazon couldn't have built its recommendation system without the foundational data mining and machine learning skills taught in this course. When contributing to the curriculum, I was careful to balance the theory with the real-world challenges of applying it to big data.

Frank Kane / Former Senior Manager, Amazon.com

Learn from skilled instructors with professional experience in the field.

San Francisco

Self-Employed

New York City

Data Scientist,

Mount Sinai

Chicago

Data Scientist,

Trunk Club

Los Angeles

ZestFinance

San Francisco

Founder,

Kylie.ai

Hong Kong

##### Mart van de Ven

Data Architect, Technologist,

Droste

## Learn In

WASHINGTON DC

Except: Jul 3

#### \$3,950USD

*Courses located at the Virginia campus are only open to individuals sponsored by their employers.

# Visit Campus

See if this program is a fit for you. Meet the GA team, get an overview of the program curriculum, and chat with other students thinking about the course.

#### Data Science Info Session

GA D.C., 1776, 1133 15th Street NW, 8th Floor, Washington, D.C. 20005, United States

#### Thanks!

We look forward to meeting you. In the meantime, our admissions team will reach out soon to discuss our courses and your goals.

24
Wednesday, 24 May 6:30pm
GA D.C., 1776, 1133 15th Street NW, 8th Floor, Washington, D.C. 20005, United States

# Financing Options

Need payment assistance? Our financing options allow you to focus on your goals instead of the barriers that keep you from reaching them.

Let us figure out the best option for you.

¹ Must be a US citizen or Permanent Resident; approval pending state of residency.
² Must be a US citizen; approval pending state of residency.

Financing options differ in each market and are only available to students accepted into our programs. Contact a local admissions officer for more info.

Be on your way towards an online masters degree. By completing this GA program you are eligible for benefits to graduate programs at distinguished universities online.

Upon completion of this course, you may become eligible to receive a tuition benefit to the following online graduate programs:

• Master of Information and Data Science from University of California Berkeley
• Master of Science in Data Science from Southern Methodist University

We love questions, almost as much as we love providing answers. Here are a few samplings of what we’re typically asked, along with our responses:

• ##### Why is this course relevant today?

Given the prevalence of technologies and the amount of data available in the online world about users, products, and the content that we generate, businesses can be making so much more well-informed decisions if this vast amount of data was more deeply analyzed through the use of data science. The data science course provides the tools, methods, and practical experience to enable you to make accurate predictions about data, which ultimately leads to better decision-making in business, and the use of smarter technology (think recommendation systems or targeted ads).

• ##### What practical skill sets can I expect to have upon completion of the course?

This course will provide you with technical skills in machine learning, algorithms, and data modeling which will allow you to make accurate predictions about your data. You will be creating your models using Python so you will gain a good grasp of this programming language. Furthermore, you will learn how to parse and clean your data which can take up to 70% of your time as a data scientist.

• ##### Whom will I be sitting next to in this course?

Individuals who have a strong interest in manipulating large data sets, finding patterns in data, and making predictions.

• ##### Are there any prerequisites?

A basic understanding of statistics

A basic understanding of variables, functions, and lists in Python