class: class: title-slide # Warm-up ### Session 1 <div class="title-footer"> <img src="images/academy-logo.png"> <div> August 12th, 2024</div> </div> ??? We're going to start off with some review of what we've learned for the past several weeks. --- class: inverse # Go to **Conf - Warm-up** .center[ <img src="images/welcome/campsite_warmup.png" width="60%" style="display: block; margin: auto;" /> [rconf.posit.academy](https://rconf.posit.academy/) ] --- ## 🚀 Warm-up .pull-left[ ![](images/welcome/your-turn-example.png)<!-- --> ] .pull-right[ * __Work together__ with your neighbors * There are often several different ways of getting to the right answer. * After 1-2 minutes, we'll go over the answer together. And then move on to the next question. ] --- class: inverse, center, middle .pull-left[ ## Done ![](images/welcome/green-square.png)<!-- --> ] .pull-right[ ## Help ![](images/welcome/pink-square.png)<!-- --> ] ??? You'll use the sticky system to signal that you're done or your need help --- class: inverse background-position: center background-size: cover # Seattle Pet Licenses data Look for `data/seattle_pets.csv` in your Files pane Source: https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/about_data ??? This data was retrieved from Seattle's Open Data Portal. It was last updated in July 2024. It contains a list of current Seattle pet licenses, including animal type (species), pet's name, breed and the owner's ZIP code. --- class: your-turn # Your Turn 1 **Read in the data saved in `data/seattle_pets.csv` and explore it. Can you recreate output that looks like this?** 💡 Hint: What function from dplyr gives you a quick glimpse of your data? ``` ## Rows: 43,683 ## Columns: 7 ## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016… ## $ license_number <chr> "S107948", "S116503", "S119301", "962273", "S133113… ## $ animal_name <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx… ## $ species <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C… ## $ primary_breed <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L… ## $ secondary_breed <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi… ## $ zip_code <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98… ``` --- ## Solution 1 ``` r library(tidyverse) seattle_pets <- read_csv("data/seattle_pets.csv") glimpse(seattle_pets) ``` ``` ## Rows: 43,683 ## Columns: 7 ## $ license_issue_date <chr> "December 18 2015", "June 14 2016", "August 04 2016… ## $ license_number <chr> "S107948", "S116503", "S119301", "962273", "S133113… ## $ animal_name <chr> "Zen", "Misty", "Lyra", "Veronica", "Spider", "Maxx… ## $ species <chr> "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "Cat", "C… ## $ primary_breed <chr> "Domestic Longhair", "Siberian", "Mix", "Domestic L… ## $ secondary_breed <chr> "Mix", NA, NA, NA, NA, NA, "Mix", "Mix", "Mix", "Mi… ## $ zip_code <dbl> 98117, 98117, 98121, 98107, 98115, 98125, 98103, 98… ``` --- class: your-turn # Your Turn 2 **How many different species are represented in `seattle_pets`? How many pets of each species are there?** 💡 Hint: What function from dplyr lets you count the unique values of one or more variables? --- # Solution 2 .pull-left[ ``` r seattle_pets |> count(species) ``` or... ``` r seattle_pets |> group_by(species) |> summarize(n = n()) ``` ] .pull-right[ ``` ## # A tibble: 4 × 2 ## species n ## <chr> <int> ## 1 Cat 13935 ## 2 Dog 29729 ## 3 Goat 16 ## 4 Pig 3 ``` ] --- # Solution 2 Because I was curious... what does one name a pet pig? ``` r seattle_pets |> filter(species == "Pig") |> pull(animal_name) ``` ``` ## [1] "Millie" "Calvin" "Waffles Olivia McHart" ``` --- class: your-turn # Your Turn 3 **What is the most popular pet name in this data set?** 💡 Hint: Try using `slice_max()` from dplyr in your solution. Look up the help docs with `?slice_max`. --- # Solution 3 .pull-left[ ``` r seattle_pets |> count(animal_name) |> slice_max(order_by = n) ``` or... ``` r seattle_pets |> count(animal_name, sort = TRUE) |> head(1) ``` or... ``` r seattle_pets |> count(animal_name) |> filter(n == max(n)) ``` ] .pull-right[ ``` ## # A tibble: 1 × 2 ## animal_name n ## <chr> <int> ## 1 Luna 410 ``` ] --- class: your-turn # Your Turn 4 **What are the top 10 most popular primary dog breeds?** 💡 Hint: Try using `count()` and `slice_max()` again in your solution -- which argument to `slice_max()` specifies the number of rows to return? --- # Solution 4 ``` r seattle_pets |> filter(species == "Dog") |> count(primary_breed) |> slice_max(order_by = n, n = 10) ``` ``` ## # A tibble: 10 × 2 ## primary_breed n ## <chr> <int> ## 1 Retriever, Labrador 3025 ## 2 Retriever, Golden 1498 ## 3 Chihuahua, Short Coat 1485 ## 4 German Shepherd 989 ## 5 Poodle, Miniature 889 ## 6 Poodle, Standard 818 ## 7 Terrier 814 ## 8 Mixed Breed, Medium (up to 44 lbs fully grown) 787 ## 9 Australian Shepherd 726 ## 10 Mixed Breed, Large (over 44 lbs fully grown) 717 ``` --- class: your-turn # Your Turn 5 (last one!) **Visualize the top 10 dog breeds, re-creating the plot below.** .pull-left[ 💡 Hint: Pay close attention to the x and y axes 💡 Hint: Start with your code from the previous exercise, and pipe this code to `ggplot()`: ``` r seattle_pets |> filter(species == "Dog") |> count(primary_breed) |> slice_max(order_by = n, n = 10) |> ____ # add code here ``` ] .pull-right[ <img src="warm-up_files/figure-html/unnamed-chunk-16-1.png" width="80%" /> ] --- # Solution 5 .pull-left[ ``` r seattle_pets |> filter(species == "Dog") |> count(primary_breed) |> slice_max(order_by = n, n = 10) |> ggplot(aes(x = n, y = primary_breed)) + geom_col() ``` or ... ``` r seattle_pets |> filter(species == "Dog") |> count(primary_breed) |> slice_max(order_by = n, n = 10) |> ggplot(aes(x = primary_breed, y = n)) + geom_col() + coord_flip() ``` ] .pull-right[ <img src="warm-up_files/figure-html/unnamed-chunk-19-1.png" width="80%" /> ] --- class: inverse, middle, center # Nice work! --- # 🤔 Exploring `seattle_pets` further... .pull-left[ What if we wanted visualize popular dog breeds in descending order? We would need to handle **factors** (categorical variables). ] .pull-right[ <img src="warm-up_files/figure-html/unnamed-chunk-20-1.png" width="90%" /> ] --- # 🤔 Exploring `seattle_pets` further... .pull-left[ What if we wanted visualize trends in number of new pet licences by month? We would need to handle **dates**. ] .pull-right[ <img src="warm-up_files/figure-html/unnamed-chunk-21-1.png" width="90%" /> ] --- # 🤔 Exploring `seattle_pets` further... .pull-left[ What if we wanted to explore pet names with a particular pattern? e.g. Pets with "Sir" somewhere in their name. We would need to handle **strings**. ] .pull-right[ ``` ## # A tibble: 22 × 2 ## animal_name species ## <chr> <chr> ## 1 Sir Pounce Cat ## 2 Sir Furcifer Cat ## 3 Sir Thomas Sharpe Cat ## 4 Sir Digby Chicken Caesar Cat ## 5 Sir Herlock Sholmes Cat ## 6 Sir Daniel Cat ## 7 Sir Dapplesox Cat ## 8 Sir Robin Cashmoney Pouncealot Cat ## 9 Sir Mill Cat ## 10 Sir Tuna Cat ## 11 Sir Loafsalot Cat ## 12 Sir Tater Tot Dog ## 13 Sir CottonBall Dog ## 14 Sir Walter Leroy Phillips Dog ## 15 Sir Waggleton Dog ## 16 Ravindale's Sir Tristan Dog ## 17 Sir Francis Dog ## 18 Sir Oliver Grayson Dog ## 19 Sir Oliver Dog ## 20 Sir Sammy Haralson Lawrence III Esq. Dog ## 21 Sir Maximillion Dog ## 22 Sir Roman Snoopy II Dog ``` ] --- # Data types Fortunately, the tidyverse provides us with tools to work with these different types of data... <br> .center[ <img src="https://github.com/rstudio/hex-stickers/blob/main/PNG/stringr.png?raw=true" width="25%" /><img src="https://github.com/rstudio/hex-stickers/blob/main/PNG/forcats.png?raw=true" width="25%" /><img src="https://github.com/rstudio/hex-stickers/blob/main/PNG/lubridate.png?raw=true" width="25%" /> ]