Warm-up

Session 1

1 / 21

We're going to start off with some review of what we've learned for the past several weeks.

Go to Conf - Warm-up

rconf.posit.academy

2 / 21

🚀 Warm-up

Work together with your neighbors
There are often several different ways of getting to the right answer.
After 1-2 minutes, we'll go over the answer together. And then move on to the next question.

3 / 21

Done

Help

4 / 21

You'll use the sticky system to signal that you're done or your need help

Seattle Pet Licenses data

Look for data/seattle_pets.csv in your Files pane

Source: https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/about_data

5 / 21

This data was retrieved from Seattle's Open Data Portal. It was last updated in July 2024.

It contains a list of current Seattle pet licenses, including animal type (species), pet's name, breed and the owner's ZIP code.

Your Turn 1

Read in the data saved in data/seattle_pets.csv and explore it. Can you recreate output that looks like this?

💡 Hint: What function from dplyr gives you a quick glimpse of your data?

## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number     <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name        <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species            <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed      <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed    <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code           <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…

6 / 21

Solution 1

library(tidyverse)
seattle_pets <- read_csv("data/seattle_pets.csv")
glimpse(seattle_pets)

## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number     <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name        <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species            <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed      <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed    <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code           <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…

7 / 21

Your Turn 2

How many different species are represented in seattle_pets? How many pets of each species are there?

💡 Hint: What function from dplyr lets you count the unique values of one or more variables?

8 / 21

Solution 2

seattle_pets |> 
  count(species)

or...

seattle_pets |> 
  group_by(species) |> 
  summarize(n = n())

## # A tibble: 4 × 2
##   species     n
##   <chr>   <int>
## 1 Cat     13935
## 2 Dog     29729
## 3 Goat       16
## 4 Pig         3

9 / 21

Solution 2

Because I was curious... what does one name a pet pig?

seattle_pets |> 
  filter(species == "Pig") |> 
  pull(animal_name)

## [1] "Millie"                "Calvin"                "Waffles Olivia McHart"

10 / 21

Your Turn 3

What is the most popular pet name in this data set?

💡 Hint: Try using slice_max() from dplyr in your solution. Look up the help docs with ?slice_max.

11 / 21

Solution 3

seattle_pets |> 
  count(animal_name) |> 
  slice_max(order_by = n)

or...

seattle_pets |> 
  count(animal_name, sort = TRUE) |> 
  head(1)

or...

seattle_pets |> 
  count(animal_name) |> 
  filter(n == max(n))

## # A tibble: 1 × 2
##   animal_name     n
##   <chr>       <int>
## 1 Luna          410

12 / 21

Your Turn 4

What are the top 10 most popular primary dog breeds?

💡 Hint: Try using count() and slice_max() again in your solution -- which argument to slice_max() specifies the number of rows to return?

13 / 21

Solution 4

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10)

## # A tibble: 10 × 2
##    primary_breed                                      n
##    <chr>                                          <int>
##  1 Retriever, Labrador                             3025
##  2 Retriever, Golden                               1498
##  3 Chihuahua, Short Coat                           1485
##  4 German Shepherd                                  989
##  5 Poodle, Miniature                                889
##  6 Poodle, Standard                                 818
##  7 Terrier                                          814
##  8 Mixed Breed, Medium (up to 44 lbs fully grown)   787
##  9 Australian Shepherd                              726
## 10 Mixed Breed, Large (over 44 lbs fully grown)     717

14 / 21

Your Turn 5 (last one!)

Visualize the top 10 dog breeds, re-creating the plot below.

💡 Hint: Pay close attention to the x and y axes

💡 Hint: Start with your code from the previous exercise, and pipe this code to ggplot():

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ____ # add code here

15 / 21

Solution 5

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ggplot(aes(x = n, y = primary_breed)) + 
  geom_col()

or ...

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ggplot(aes(x = primary_breed, y = n)) + 
  geom_col() + 
  coord_flip()

16 / 21

Nice work!17 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted visualize popular dog breeds in descending order?

We would need to handle factors (categorical variables).

18 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted visualize trends in number of new pet licences by month?

We would need to handle dates.

19 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted to explore pet names with a particular pattern?

e.g. Pets with "Sir" somewhere in their name.

We would need to handle strings.

## # A tibble: 22 × 2
##    animal_name                          species
##    <chr>                                <chr>  
##  1 Sir Pounce                           Cat    
##  2 Sir Furcifer                         Cat    
##  3 Sir Thomas Sharpe                    Cat    
##  4 Sir Digby Chicken Caesar             Cat    
##  5 Sir Herlock Sholmes                  Cat    
##  6 Sir Daniel                           Cat    
##  7 Sir Dapplesox                        Cat    
##  8 Sir Robin Cashmoney Pouncealot       Cat    
##  9 Sir Mill                             Cat    
## 10 Sir Tuna                             Cat    
## 11 Sir Loafsalot                        Cat    
## 12 Sir Tater Tot                        Dog    
## 13 Sir CottonBall                       Dog    
## 14 Sir Walter Leroy Phillips            Dog    
## 15 Sir Waggleton                        Dog    
## 16 Ravindale's Sir Tristan              Dog    
## 17 Sir Francis                          Dog    
## 18 Sir Oliver Grayson                   Dog    
## 19 Sir Oliver                           Dog    
## 20 Sir Sammy Haralson Lawrence III Esq. Dog    
## 21 Sir Maximillion                      Dog    
## 22 Sir Roman Snoopy II                  Dog

20 / 21

Data types

Fortunately, the tidyverse provides us with tools to work with these different types of data...

21 / 21

Warm-up

Session 1

1 / 21

We're going to start off with some review of what we've learned for the past several weeks.

Go to Conf - Warm-up

rconf.posit.academy

2 / 21

🚀 Warm-up

Work together with your neighbors
There are often several different ways of getting to the right answer.
After 1-2 minutes, we'll go over the answer together. And then move on to the next question.

3 / 21

Done

Help

4 / 21

You'll use the sticky system to signal that you're done or your need help

Seattle Pet Licenses data

Look for data/seattle_pets.csv in your Files pane

Source: https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/about_data

5 / 21

This data was retrieved from Seattle's Open Data Portal. It was last updated in July 2024.

It contains a list of current Seattle pet licenses, including animal type (species), pet's name, breed and the owner's ZIP code.

Your Turn 1

Read in the data saved in data/seattle_pets.csv and explore it. Can you recreate output that looks like this?

💡 Hint: What function from dplyr gives you a quick glimpse of your data?

## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number     <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name        <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species            <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed      <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed    <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code           <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…

6 / 21

Solution 1

library(tidyverse)
seattle_pets <- read_csv("data/seattle_pets.csv")
glimpse(seattle_pets)

## Rows: 43,683
## Columns: 7
## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016…
## $ license_number     <chr> "S107948", "S116503", "S119301", "962273", "S133113…
## $ animal_name        <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx…
## $ species            <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C…
## $ primary_breed      <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L…
## $ secondary_breed    <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi…
## $ zip_code           <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98…

7 / 21

Your Turn 2

How many different species are represented in seattle_pets? How many pets of each species are there?

💡 Hint: What function from dplyr lets you count the unique values of one or more variables?

8 / 21

Solution 2

seattle_pets |> 
  count(species)

or...

seattle_pets |> 
  group_by(species) |> 
  summarize(n = n())

## # A tibble: 4 × 2
##   species     n
##   <chr>   <int>
## 1 Cat     13935
## 2 Dog     29729
## 3 Goat       16
## 4 Pig         3

9 / 21

Solution 2

Because I was curious... what does one name a pet pig?

seattle_pets |> 
  filter(species == "Pig") |> 
  pull(animal_name)

## [1] "Millie"                "Calvin"                "Waffles Olivia McHart"

10 / 21

Your Turn 3

What is the most popular pet name in this data set?

💡 Hint: Try using slice_max() from dplyr in your solution. Look up the help docs with ?slice_max.

11 / 21

Solution 3

seattle_pets |> 
  count(animal_name) |> 
  slice_max(order_by = n)

or...

seattle_pets |> 
  count(animal_name, sort = TRUE) |> 
  head(1)

or...

seattle_pets |> 
  count(animal_name) |> 
  filter(n == max(n))

## # A tibble: 1 × 2
##   animal_name     n
##   <chr>       <int>
## 1 Luna          410

12 / 21

Your Turn 4

What are the top 10 most popular primary dog breeds?

💡 Hint: Try using count() and slice_max() again in your solution -- which argument to slice_max() specifies the number of rows to return?

13 / 21

Solution 4

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10)

## # A tibble: 10 × 2
##    primary_breed                                      n
##    <chr>                                          <int>
##  1 Retriever, Labrador                             3025
##  2 Retriever, Golden                               1498
##  3 Chihuahua, Short Coat                           1485
##  4 German Shepherd                                  989
##  5 Poodle, Miniature                                889
##  6 Poodle, Standard                                 818
##  7 Terrier                                          814
##  8 Mixed Breed, Medium (up to 44 lbs fully grown)   787
##  9 Australian Shepherd                              726
## 10 Mixed Breed, Large (over 44 lbs fully grown)     717

14 / 21

Your Turn 5 (last one!)

Visualize the top 10 dog breeds, re-creating the plot below.

💡 Hint: Pay close attention to the x and y axes

💡 Hint: Start with your code from the previous exercise, and pipe this code to ggplot():

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ____ # add code here

15 / 21

Solution 5

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ggplot(aes(x = n, y = primary_breed)) + 
  geom_col()

or ...

seattle_pets |> 
  filter(species == "Dog") |> 
  count(primary_breed) |> 
  slice_max(order_by = n, n = 10) |> 
  ggplot(aes(x = primary_breed, y = n)) + 
  geom_col() + 
  coord_flip()

16 / 21

Nice work!17 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted visualize popular dog breeds in descending order?

We would need to handle factors (categorical variables).

18 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted visualize trends in number of new pet licences by month?

We would need to handle dates.

19 / 21

🤔 Exploring `seattle_pets` further...

What if we wanted to explore pet names with a particular pattern?

e.g. Pets with "Sir" somewhere in their name.

We would need to handle strings.

## # A tibble: 22 × 2
##    animal_name                          species
##    <chr>                                <chr>  
##  1 Sir Pounce                           Cat    
##  2 Sir Furcifer                         Cat    
##  3 Sir Thomas Sharpe                    Cat    
##  4 Sir Digby Chicken Caesar             Cat    
##  5 Sir Herlock Sholmes                  Cat    
##  6 Sir Daniel                           Cat    
##  7 Sir Dapplesox                        Cat    
##  8 Sir Robin Cashmoney Pouncealot       Cat    
##  9 Sir Mill                             Cat    
## 10 Sir Tuna                             Cat    
## 11 Sir Loafsalot                        Cat    
## 12 Sir Tater Tot                        Dog    
## 13 Sir CottonBall                       Dog    
## 14 Sir Walter Leroy Phillips            Dog    
## 15 Sir Waggleton                        Dog    
## 16 Ravindale's Sir Tristan              Dog    
## 17 Sir Francis                          Dog    
## 18 Sir Oliver Grayson                   Dog    
## 19 Sir Oliver                           Dog    
## 20 Sir Sammy Haralson Lawrence III Esq. Dog    
## 21 Sir Maximillion                      Dog    
## 22 Sir Roman Snoopy II                  Dog

20 / 21

Data types

Fortunately, the tidyverse provides us with tools to work with these different types of data...

21 / 21

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Warm-up

Session 1

Go to Conf - Warm-up

🚀 Warm-up

Done

Help

Seattle Pet Licenses data

Your Turn 1

Solution 1

Your Turn 2

Solution 2

Solution 2

Your Turn 3

Solution 3

Your Turn 4

Solution 4

Your Turn 5 (last one!)

Solution 5

Nice work!

🤔 Exploring seattle_pets further...

🤔 Exploring seattle_pets further...

🤔 Exploring seattle_pets further...

Data types

Go to Conf - Warm-up

Help

Warm-up

Warm-up

Session 1

Go to Conf - Warm-up

🚀 Warm-up

Done

Help

Seattle Pet Licenses data

Your Turn 1

Solution 1

Your Turn 2

Solution 2

Solution 2

Your Turn 3

Solution 3

Your Turn 4

Solution 4

Your Turn 5 (last one!)

Solution 5

Nice work!

🤔 Exploring seattle_pets further...

🤔 Exploring seattle_pets further...

🤔 Exploring seattle_pets further...

Data types

🤔 Exploring `seattle_pets` further...

🤔 Exploring `seattle_pets` further...

🤔 Exploring `seattle_pets` further...

🤔 Exploring `seattle_pets` further...

🤔 Exploring `seattle_pets` further...

🤔 Exploring `seattle_pets` further...