+ - 0:00:00
Notes for current slide

We're going to start off with some review of what we've learned for the past several weeks.

Notes for next slide

Warm-up

Session 1

1 / 21

We're going to start off with some review of what we've learned for the past several weeks.

Go to Conf - Warm-up

2 / 21

🚀 Warm-up

  • Work together with your neighbors

  • There are often several different ways of getting to the right answer.

  • After 1-2 minutes, we'll go over the answer together. And then move on to the next question.

3 / 21

Done

Help

4 / 21

You'll use the sticky system to signal that you're done or your need help

Seattle Pet Licenses data

Look for data/seattle_pets.csv in your Files pane

Source: https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/about_data

5 / 21

This data was retrieved from Seattle's Open Data Portal. It was last updated in July 2024.

It contains a list of current Seattle pet licenses, including animal type (species), pet's name, breed and the owner's ZIP code.

Your Turn 1

Read in the data saved in data/seattle_pets.csv and explore it. Can you recreate output that looks like this?

💡 Hint: What function from dplyr gives you a quick glimpse of your data?

## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…
6 / 21

Solution 1

library(tidyverse)
seattle_pets <- read_csv("data/seattle_pets.csv")
glimpse(seattle_pets)
## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…
7 / 21

Your Turn 2

How many different species are represented in seattle_pets? How many pets of each species are there?

💡 Hint: What function from dplyr lets you count the unique values of one or more variables?

8 / 21

Solution 2

seattle_pets |>
count(species)

or...

seattle_pets |>
group_by(species) |>
summarize(n = n())
## # A tibble: 4 × 2
## species n
## <chr> <int>
## 1 Cat 13935
## 2 Dog 29729
## 3 Goat 16
## 4 Pig 3
9 / 21

Solution 2

Because I was curious... what does one name a pet pig?

seattle_pets |>
filter(species == "Pig") |>
pull(animal_name)
## [1] "Millie" "Calvin" "Waffles Olivia McHart"
10 / 21

Your Turn 3

What is the most popular pet name in this data set?

💡 Hint: Try using slice_max() from dplyr in your solution. Look up the help docs with ?slice_max.

11 / 21

Solution 3

seattle_pets |>
count(animal_name) |>
slice_max(order_by = n)

or...

seattle_pets |>
count(animal_name, sort = TRUE) |>
head(1)

or...

seattle_pets |>
count(animal_name) |>
filter(n == max(n))
## # A tibble: 1 × 2
## animal_name n
## <chr> <int>
## 1 Luna 410
12 / 21

Your Turn 4

What are the top 10 most popular primary dog breeds?

💡 Hint: Try using count() and slice_max() again in your solution -- which argument to slice_max() specifies the number of rows to return?

13 / 21

Solution 4

seattle_pets |>
filter(species == "Dog") |>
count(primary_breed) |>
slice_max(order_by = n, n = 10)
## # A tibble: 10 × 2
## primary_breed n
## <chr> <int>
## 1 Retriever, Labrador 3025
## 2 Retriever, Golden 1498
## 3 Chihuahua, Short Coat 1485
## 4 German Shepherd 989
## 5 Poodle, Miniature 889
## 6 Poodle, Standard 818
## 7 Terrier 814
## 8 Mixed Breed, Medium (up to 44 lbs fully grown) 787
## 9 Australian Shepherd 726
## 10 Mixed Breed, Large (over 44 lbs fully grown) 717
14 / 21

Your Turn 5 (last one!)

Visualize the top 10 dog breeds, re-creating the plot below.

💡 Hint: Pay close attention to the x and y axes

💡 Hint: Start with your code from the previous exercise, and pipe this code to ggplot():

seattle_pets |>
filter(species == "Dog") |>
count(primary_breed) |>
slice_max(order_by = n, n = 10) |>
____ # add code here

15 / 21

Solution 5

seattle_pets |>
filter(species == "Dog") |>
count(primary_breed) |>
slice_max(order_by = n, n = 10) |>
ggplot(aes(x = n, y = primary_breed)) +
geom_col()

or ...

seattle_pets |>
filter(species == "Dog") |>
count(primary_breed) |>
slice_max(order_by = n, n = 10) |>
ggplot(aes(x = primary_breed, y = n)) +
geom_col() +
coord_flip()

16 / 21

Nice work!

17 / 21

🤔 Exploring seattle_pets further...

What if we wanted visualize popular dog breeds in descending order?

We would need to handle factors (categorical variables).

18 / 21

🤔 Exploring seattle_pets further...

What if we wanted visualize trends in number of new pet licences by month?

We would need to handle dates.

19 / 21

🤔 Exploring seattle_pets further...

What if we wanted to explore pet names with a particular pattern?

e.g. Pets with "Sir" somewhere in their name.

We would need to handle strings.

## # A tibble: 22 × 2
## animal_name species
## <chr> <chr>
## 1 Sir Pounce Cat
## 2 Sir Furcifer Cat
## 3 Sir Thomas Sharpe Cat
## 4 Sir Digby Chicken Caesar Cat
## 5 Sir Herlock Sholmes Cat
## 6 Sir Daniel Cat
## 7 Sir Dapplesox Cat
## 8 Sir Robin Cashmoney Pouncealot Cat
## 9 Sir Mill Cat
## 10 Sir Tuna Cat
## 11 Sir Loafsalot Cat
## 12 Sir Tater Tot Dog
## 13 Sir CottonBall Dog
## 14 Sir Walter Leroy Phillips Dog
## 15 Sir Waggleton Dog
## 16 Ravindale's Sir Tristan Dog
## 17 Sir Francis Dog
## 18 Sir Oliver Grayson Dog
## 19 Sir Oliver Dog
## 20 Sir Sammy Haralson Lawrence III Esq. Dog
## 21 Sir Maximillion Dog
## 22 Sir Roman Snoopy II Dog
20 / 21

Data types

Fortunately, the tidyverse provides us with tools to work with these different types of data...


21 / 21

Go to Conf - Warm-up

2 / 21
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow