Do you want to be a data scientist? Data Science and machine learning are rapidly becoming a vital discipline for all types of businesses. An ability to extract insight and meaning from a large pile of data is a skill set worth its weight in gold. Due to its versatility and ease of use, Python programming has become the programming language of choice for data scientists.
In this Python crash course, we will walk you through a couple of examples using two of the most-used data types: the list and the Pandas DataFrame. The list is self-explanatory; it’s a collection of values set in a one-dimensional array. A Pandas DataFrame is just like a tabular spreadsheet, it has data laid out in columns and rows.
Let’s take a look at a few neat things we can do with lists and DataFrames in Python!
Get the PDF here.
BEGINNER’S Python Cheat Sheet
Lists
Creating Lists
Let’s start this Python tutorial by creating lists. Create an empty list and use a for loop to append new values. What you need to do is:
#add two to each value
my_list = []
for x in range(1,11):
my_list.append(x+2)
We can also do this in one step using list comprehension:
my_list = [x + 2 for x in range(1,11)]
Creating Lists with Conditionals
As above, we will create a list, but now we will only add 2 to the value if it is even.
#add two, but only if x is even
my_list = []
for x in range(1,11):
if x % 2 == 0:
my_list.append(x+2)
else:
my_list.append(x)
Using a list comp:
my_list = [x+2 if x % 2 == 0 else x \
for x in range(1,11)]
Selecting Elements and Basic Stats
Select elements by index.
#get the first/last element
first_ele = my_list[0]
last_ele = my_list[-1]
Some basic stats on lists:
#get max/min/mean value
biggest_val = max(my_list)
smallest_val = min(my_list)avg_val = sum(my_list) / len(my_list)
DataFrames
Reading in Data to a DataFrame
We first need to import the pandas module.
import pandas as pd
Then we can read in data from csv or xlsx files:
df_from_csv = pd.read_csv(‘path/to/my_file.csv’,
sep=’,’,
nrows=10)
xlsx = pd.ExcelFile(‘path/to/excel_file.xlsx’)
df_from_xlsx = pd.read_excel(xlsx, ‘Sheet1’)
Slicing DataFrames
We can slice our DataFrame using conditionals.
df_filter = df[df[‘population’] > 1000000]
df_france = df[df[‘country’] == ‘France’]
Sorting values by a column:
df.sort_values(by=’population’,
ascending=False)
Filling Missing Values
Let’s fill in any missing values with that column’s average value.
df[‘population’] = df[‘population’].fillna(
value=df[‘population’].mean()
)
Applying Functions to Columns
Apply a custom function to every value in one of the DataFrame’s columns.
def fix_zipcode(x):
”’
make sure that zipcodes all have leading zeros
”’
return str(x).zfill(5)
df[‘clean_zip’] = df[‘zip code’].apply(fix_zipcode)
Ready to take on the world of machine learning and data science? Now that you know what you can do with lists and DataFrames using Python language, check out our other Python beginner tutorials and learn about other important concepts of the Python programming language.