Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
MDST Tutorials
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
jackmic
MDST Tutorials
Commits
bec55d3e
Commit
bec55d3e
authored
2 years ago
by
jackmic
Browse files
Options
Downloads
Patches
Plain Diff
Upload New File
parent
81ebc872
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
checkpoint0.ipynb
+549
-0
549 additions, 0 deletions
checkpoint0.ipynb
with
549 additions
and
0 deletions
checkpoint0.ipynb
0 → 100644
+
549
−
0
View file @
bec55d3e
{
"cells": [
{
"cell_type": "markdown",
"id": "44bff40d",
"metadata": {
"colab_type": "text",
"id": "0a8IYAJUshu1"
},
"source": [
"# Checkpoint 0 "
]
},
{
"cell_type": "markdown",
"id": "02215935",
"metadata": {},
"source": [
"These exercises are a mix of Python and Pandas practice. Most should be no more than a few lines of code! "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a0f62714",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Jo6wuTgkshu1"
},
"outputs": [],
"source": [
"# here is a Python list:\n",
"\n",
"a = [1, 2, 3, 4, 5, 6]\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "779d96b1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[4, 5, 6]\n"
]
}
],
"source": [
"# get a list containing the last 3 elements of a\n",
"# Yes, you can just type out [4, 5, 6] but we really want to see you demonstrate you know how to do that in Python\n",
"b = a[-3::]\n",
"print(b)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "b6a54def",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]\n"
]
}
],
"source": [
"# create a list of numbers from 1 to 100\n",
"c = list(range(1, 101))\n",
"print(c)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "487873ac",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]\n"
]
}
],
"source": [
"# now get a list with only the even numbers between 1 and 100\n",
"# you may or may not make use of the list you made in the last cell\n",
"d = list(range(2, 101, 2))\n",
"print(d)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "3d4bb5dd",
"metadata": {},
"outputs": [],
"source": [
"# write a function that takes two numbers as arguments\n",
"# and returns the first number divided by the second\n",
"def divide(num1, num2):\n",
" return num1 / num2"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "b93669fa",
"metadata": {},
"outputs": [],
"source": [
"# write a function that takes a string as input\n",
"# and return that string in all caps\n",
"def capitalize(string):\n",
" return string.upper()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "f55df04e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"fizz\n",
"4\n",
"buzz\n",
"fizz\n",
"7\n",
"8\n",
"fizz\n",
"buzz\n",
"11\n",
"fizz\n",
"13\n",
"14\n",
"fizzbuzz\n",
"16\n",
"17\n",
"fizz\n",
"19\n",
"buzz\n",
"fizz\n",
"22\n",
"23\n",
"fizz\n",
"buzz\n",
"26\n",
"fizz\n",
"28\n",
"29\n",
"fizzbuzz\n",
"31\n",
"32\n",
"fizz\n",
"34\n",
"buzz\n",
"fizz\n",
"37\n",
"38\n",
"fizz\n",
"buzz\n",
"41\n",
"fizz\n",
"43\n",
"44\n",
"fizzbuzz\n",
"46\n",
"47\n",
"fizz\n",
"49\n",
"buzz\n",
"fizz\n",
"52\n",
"53\n",
"fizz\n",
"buzz\n",
"56\n",
"fizz\n",
"58\n",
"59\n",
"fizzbuzz\n",
"61\n",
"62\n",
"fizz\n",
"64\n",
"buzz\n",
"fizz\n",
"67\n",
"68\n",
"fizz\n",
"buzz\n",
"71\n",
"fizz\n",
"73\n",
"74\n",
"fizzbuzz\n",
"76\n",
"77\n",
"fizz\n",
"79\n",
"buzz\n",
"fizz\n",
"82\n",
"83\n",
"fizz\n",
"buzz\n",
"86\n",
"fizz\n",
"88\n",
"89\n",
"fizzbuzz\n",
"91\n",
"92\n",
"fizz\n",
"94\n",
"buzz\n",
"fizz\n",
"97\n",
"98\n",
"fizz\n",
"buzz\n"
]
}
],
"source": [
"# optional challenge - fizzbuzz\n",
"# you will need to use both iteration and control flow \n",
"# go through all numbers from 1 to 100 in order\n",
"# if the number is a multiple of 3, print fizz\n",
"# if the number is a multiple of 5, print buzz\n",
"# if the number is a multiple of 3 and 5, print fizzbuzz and NOTHING ELSE\n",
"# if the number is neither a multiple of 3 nor a multiple of 5, print the number\n",
"\n",
"for num in list(range(1, 101)):\n",
" if((num % 3 == 0) & (num % 5 == 0)):\n",
" print(\"fizzbuzz\")\n",
" elif(num % 3 == 0):\n",
" print(\"fizz\")\n",
" elif(num % 5 == 0):\n",
" print(\"buzz\")\n",
" else:\n",
" print(num)\n"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "78aace0b",
"metadata": {},
"outputs": [],
"source": [
"# create a dictionary that reflects the following menu pricing (taken from Ahmo's)\n",
"# Gyro: $9 \n",
"# Burger: $9\n",
"# Greek Salad: $8\n",
"# Philly Steak: $10\n",
"\n",
"menu = {\"Gyro\":9, \"Burger\":9, \"Greek Salad\":8, \"Philly Steak\":10}"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "a2a78a4b",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "WzCQ5HOJshvA"
},
"outputs": [],
"source": [
"# load in the \"starbucks.csv\" dataset\n",
"# refer to how we read the cereal.csv dataset in the tutorial\n",
"import pandas\n",
"df = pandas.read_csv(\"starbucks.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 88,
"id": "68210b5f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" calories sugars protein\n",
"0 3 0 0.3\n",
"40 5 0 0.4\n",
"80 350 58 15.0\n",
"120 140 20 6.0\n",
"160 110 24 2.0\n",
"200 200 41 3.0\n",
"240 180 35 3.0\n"
]
}
],
"source": [
"# output the calories, sugars, and protein columns only of every 40th row. \n",
"print(df.iloc[0::40][[\"calories\", \"sugars\", \"protein\"]])"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "ac0f0c12",
"metadata": {},
"outputs": [],
"source": [
"# select all rows with more than and including 400 calories\n",
"hi_cal_rows = df[df[\"calories\"] >= 400]"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "ee8f8241",
"metadata": {},
"outputs": [],
"source": [
"# select all rows whose vitamin c content is higher than the iron content\n",
"vitc_greaterthan_iron_rows = df[df[\"vitamin c\"] > df[\"iron\"]]"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "d4de48bb",
"metadata": {},
"outputs": [],
"source": [
"# create a new column containing the caffeine per calories of each drink\n",
"df[\"caffeine per calories\"] = df[\"caffeine\"] / df[\"calories\"]"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "3a72465a",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "rIoxaSxHshvB"
},
"outputs": [
{
"data": {
"text/plain": [
"193.87190082644628"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# what is the average calorie across all items?\n",
"df[\"calories\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "7714895a",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ABX7i49FshvD"
},
"outputs": [
{
"data": {
"text/plain": [
"9"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how many different categories of beverages are there?\n",
"df[\"beverage_category\"].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "62392999",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "P9QatZAzshvE"
},
"outputs": [
{
"data": {
"text/plain": [
"beverage_category\n",
"classic espresso drinks 140.172414\n",
"coffee 4.250000\n",
"frappuccino blended coffee 276.944444\n",
"frappuccino blended crme 233.076923\n",
"frappuccino light blended coffee 162.500000\n",
"shaken iced beverages 114.444444\n",
"signature espresso drinks 250.000000\n",
"smoothies 282.222222\n",
"tazo tea drinks 177.307692\n",
"Name: calories, dtype: float64"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# what is the average # calories for each beverage category?\n",
"bev_categories = df.groupby(\"beverage_category\")\n",
"bev_categories[\"calories\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "435e9d80",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot: title={'center': 'Distribution of Calories'}, ylabel='Frequency'>"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# plot the distribution of the number of calories in drinks with a histogram\n",
"df[\"calories\"].plot.hist(edgecolor=\"black\", title = \"Distribution of Calories\")"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "ba8948eb",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot: title={'center': 'Calories vs Total Fat'}, xlabel='calories', ylabel='total fat'>"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# plot calories against total fat with a scatterplot\n",
"df.plot.scatter(x=\"calories\", y=\"total fat\", title=\"Calories vs Total Fat\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fe7fb2a",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ebada65",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
},
"vscode": {
"interpreter": {
"hash": "6cf8df3ff69f85f626faf55c10df6fe2cb9d1236b4dc73844ee4dc01369c2c99"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id:44bff40d tags:
# Checkpoint 0
%% Cell type:markdown id:02215935 tags:
These exercises are a mix of Python and Pandas practice. Most should be no more than a few lines of code!
%% Cell type:code id:a0f62714 tags:
```
python
# here is a Python list:
a
=
[
1
,
2
,
3
,
4
,
5
,
6
]
```
%% Cell type:code id:779d96b1 tags:
```
python
# get a list containing the last 3 elements of a
# Yes, you can just type out [4, 5, 6] but we really want to see you demonstrate you know how to do that in Python
b
=
a
[
-
3
::]
print
(
b
)
```
%% Output
[4, 5, 6]
%% Cell type:code id:b6a54def tags:
```
python
# create a list of numbers from 1 to 100
c
=
list
(
range
(
1
,
101
))
print
(
c
)
```
%% Output
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
%% Cell type:code id:487873ac tags:
```
python
# now get a list with only the even numbers between 1 and 100
# you may or may not make use of the list you made in the last cell
d
=
list
(
range
(
2
,
101
,
2
))
print
(
d
)
```
%% Output
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]
%% Cell type:code id:3d4bb5dd tags:
```
python
# write a function that takes two numbers as arguments
# and returns the first number divided by the second
def
divide
(
num1
,
num2
):
return
num1
/
num2
```
%% Cell type:code id:b93669fa tags:
```
python
# write a function that takes a string as input
# and return that string in all caps
def
capitalize
(
string
):
return
string
.
upper
()
```
%% Cell type:code id:f55df04e tags:
```
python
# optional challenge - fizzbuzz
# you will need to use both iteration and control flow
# go through all numbers from 1 to 100 in order
# if the number is a multiple of 3, print fizz
# if the number is a multiple of 5, print buzz
# if the number is a multiple of 3 and 5, print fizzbuzz and NOTHING ELSE
# if the number is neither a multiple of 3 nor a multiple of 5, print the number
for
num
in
list
(
range
(
1
,
101
)):
if
((
num
%
3
==
0
)
&
(
num
%
5
==
0
)):
print
(
"
fizzbuzz
"
)
elif
(
num
%
3
==
0
):
print
(
"
fizz
"
)
elif
(
num
%
5
==
0
):
print
(
"
buzz
"
)
else
:
print
(
num
)
```
%% Output
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz
fizz
22
23
fizz
buzz
26
fizz
28
29
fizzbuzz
31
32
fizz
34
buzz
fizz
37
38
fizz
buzz
41
fizz
43
44
fizzbuzz
46
47
fizz
49
buzz
fizz
52
53
fizz
buzz
56
fizz
58
59
fizzbuzz
61
62
fizz
64
buzz
fizz
67
68
fizz
buzz
71
fizz
73
74
fizzbuzz
76
77
fizz
79
buzz
fizz
82
83
fizz
buzz
86
fizz
88
89
fizzbuzz
91
92
fizz
94
buzz
fizz
97
98
fizz
buzz
%% Cell type:code id:78aace0b tags:
```
python
# create a dictionary that reflects the following menu pricing (taken from Ahmo's)
# Gyro: $9
# Burger: $9
# Greek Salad: $8
# Philly Steak: $10
menu
=
{
"
Gyro
"
:
9
,
"
Burger
"
:
9
,
"
Greek Salad
"
:
8
,
"
Philly Steak
"
:
10
}
```
%% Cell type:code id:a2a78a4b tags:
```
python
# load in the "starbucks.csv" dataset
# refer to how we read the cereal.csv dataset in the tutorial
import
pandas
df
=
pandas
.
read_csv
(
"
starbucks.csv
"
)
```
%% Cell type:code id:68210b5f tags:
```
python
# output the calories, sugars, and protein columns only of every 40th row.
print
(
df
.
iloc
[
0
::
40
][[
"
calories
"
,
"
sugars
"
,
"
protein
"
]])
```
%% Output
calories sugars protein
0 3 0 0.3
40 5 0 0.4
80 350 58 15.0
120 140 20 6.0
160 110 24 2.0
200 200 41 3.0
240 180 35 3.0
%% Cell type:code id:ac0f0c12 tags:
```
python
# select all rows with more than and including 400 calories
hi_cal_rows
=
df
[
df
[
"
calories
"
]
>=
400
]
```
%% Cell type:code id:ee8f8241 tags:
```
python
# select all rows whose vitamin c content is higher than the iron content
vitc_greaterthan_iron_rows
=
df
[
df
[
"
vitamin c
"
]
>
df
[
"
iron
"
]]
```
%% Cell type:code id:d4de48bb tags:
```
python
# create a new column containing the caffeine per calories of each drink
df
[
"
caffeine per calories
"
]
=
df
[
"
caffeine
"
]
/
df
[
"
calories
"
]
```
%% Cell type:code id:3a72465a tags:
```
python
# what is the average calorie across all items?
df
[
"
calories
"
].
mean
()
```
%% Output
193.87190082644628
%% Cell type:code id:7714895a tags:
```
python
# how many different categories of beverages are there?
df
[
"
beverage_category
"
].
nunique
()
```
%% Output
9
%% Cell type:code id:62392999 tags:
```
python
# what is the average # calories for each beverage category?
bev_categories
=
df
.
groupby
(
"
beverage_category
"
)
bev_categories
[
"
calories
"
].
mean
()
```
%% Output
beverage_category
classic espresso drinks 140.172414
coffee 4.250000
frappuccino blended coffee 276.944444
frappuccino blended crme 233.076923
frappuccino light blended coffee 162.500000
shaken iced beverages 114.444444
signature espresso drinks 250.000000
smoothies 282.222222
tazo tea drinks 177.307692
Name: calories, dtype: float64
%% Cell type:code id:435e9d80 tags:
```
python
# plot the distribution of the number of calories in drinks with a histogram
df
[
"
calories
"
].
plot
.
hist
(
edgecolor
=
"
black
"
,
title
=
"
Distribution of Calories
"
)
```
%% Output
<AxesSubplot: title={'center': 'Distribution of Calories'}, ylabel='Frequency'>
%% Cell type:code id:ba8948eb tags:
```
python
# plot calories against total fat with a scatterplot
df
.
plot
.
scatter
(
x
=
"
calories
"
,
y
=
"
total fat
"
,
title
=
"
Calories vs Total Fat
"
)
```
%% Output
<AxesSubplot: title={'center': 'Calories vs Total Fat'}, xlabel='calories', ylabel='total fat'>
%% Cell type:code id:4fe7fb2a tags:
```
python
``
`
%%
Cell
type
:
code
id
:
5
ebada65
tags
:
```
python
```
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment