library(tidyverse)Download dominick_oj_q1a.csv from the Midterm Exam in
the Assignments or the Files sections in our Canvas.
Then import the dominick_oj_q1a.csv using the following
lines:
oj_q1a <- read_csv('ABSOLUTE_PATH_NAME_FOR_THE_FILE_dominick_oj_q1a.csv')
table(oj_q1a$brand)You need to provide the absolute path name for the file,
dominick_oj_q1a.csv to the above read_csv()
function to read the file.
sales: the number of orange juice (OJ) cartons sold
in a week
price: price of OJ carton
brand: OJ brand
feat: Advertisement status— 1 if advertised; 0 if
not advertised.
Report (1) minimum, (2) median, (3) maximum, (4) mean, and (5)
standard deviation of variable price for the
brand, Dominick’s OJ.
library(tidyverse)
oj_q1a <- read_csv('/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/dominick_oj_q1a.csv')
table(oj_q1a$brand)##
## dominicks minute.maid tropicana
## 9602 9649 9649
library(skimr)
oj_q1a_d <- filter(oj_q1a, brand == 'dominicks')
skim(oj_q1a_d$price)| Name | oj_q1a_d$price |
| Number of rows | 9602 |
| Number of columns | 1 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| data | 0 | 1 | 1.74 | 0.39 | 0.52 | 1.58 | 1.59 | 1.99 | 2.69 | ▁▂▇▃▂ |
price for the brand,
dominicks are ……For Question 1b, run the following function to read the
dominick_oj.csv file:
oj_q1b <- read_csv(
'https://bcdanl.github.io/data/dominick_oj.csv'
)The description of variables in oj_q1b is the same as
oj_q1a.
price and
the log of sales by brand using ggplot. Make a
simple comment on your ggplot figure.summary(oj_q1b)## sales price brand feat
## Min. : 64 Min. :0.520 Length:28947 Min. :0.0000
## 1st Qu.: 4864 1st Qu.:1.790 Class :character 1st Qu.:0.0000
## Median : 8384 Median :2.170 Mode :character Median :0.0000
## Mean : 17312 Mean :2.282 Mean :0.2373
## 3rd Qu.: 17408 3rd Qu.:2.730 3rd Qu.:0.0000
## Max. :716416 Max. :3.870 Max. :1.0000
ggplot(data = oj_q1b,
mapping = aes(x = log(price), y = log(sales),
color = brand)) +
geom_point(alpha = .05) +
geom_smooth(method = lm, se = F)
Here we observe the downward sloping demand curve for each OJ brand.
For the same level of price, the demand for tropicana is
the highest, followed by minute.maid and
dominicks.
tropicana is the luxurious OJ;dominicks OJ seems to be a budget option.minute.maid is somewhat between the two.For Question 2, run the following R command to read the
nyc_dogs.csv file.
nyc_dogs <- read_csv('https://bcdanl.github.io/data/nyc_dogs.csv')Describe the distribution of animal_gender using ggplot.
Make a simple comment on your ggplot figure.
ggplot(nyc_dogs) +
geom_bar(aes(x = animal_gender))
ggplot(nyc_dogs) +
geom_density(aes(x = animal_gender))
ggplot(nyc_dogs) +
geom_boxplot(aes(x = animal_gender))
ggplot(nyc_dogs) +
geom_bar(aes(x = animal_gender)) +
facet_wrap(~borough)
There are more male dogs than female dogs.
There are more male dogs than female dogs in each borough.
Find the five most popular breeds in NYC.
# 0.
table(nyc_dogs$breed_rc)##
## Affenpinscher Afghan Hound
## 136 89
## Afghan Hound Crossbreed Airedale Terrier
## 19 227
## Akita Akita Crossbreed
## 491 151
## Alaskan Klee Kai Alaskan Malamute
## 113 287
## American Bully American English Coonhound
## 1100 103
## American Eskimo Dog American Foxhound
## 1145 265
## American Hairless Terrier American Leopard Hound
## 15 11
## American Staffordshire Terrier American Water Spaniel
## 3052 15
## Anatolian Shepherd Dog Australian Cattle Dog
## 305 730
## Australian Cattledog Australian Kelpie
## 1324 126
## Australian Shepherd Australian Silky Terrier
## 4212 149
## Australian Terrier Azawakh
## 107 3
## Baladi Barbet
## 8 15
## Basenji Basset Hound
## 595 526
## Basset Hound Crossbreed Bassett Hound
## 102 571
## Beagle Beagle Crossbreed
## 7822 4224
## Bearded Collie Beauceron
## 80 11
## Bedlington Terrier Belgian Griffon
## 92 24
## Belgian Laekenois Belgian Malinois
## 4 407
## Belgian Sheepdog Belgian Tervuren
## 78 17
## Berger Picard Bernese Mountain Dog
## 23 923
## Bichon Frise Bichon Frise Crossbreed
## 4310 1620
## Biewer Terrier Black And Tan Coonhound
## 50 162
## Black Russian Terrier Bloodhound
## 33 108
## Bloodhound Crossbreed Bluetick Coonhound
## 58 134
## Boerboel Bolognese
## 56 43
## Border Collie Border Collie Crossbreed
## 1485 725
## Border Terrier Borzoi
## 800 64
## Boston Terrier Bouvier Des Flandres
## 4670 71
## Boxer Boxer Crossbreed
## 3713 776
## Boykin Spaniel Bracco Italiano
## 29 7
## Briard Brittany
## 61 93
## Brittany Spaniel Brussels Griffon
## 358 1137
## Brussels Griffon Crossbreed Bull Dog, American
## 35 327
## Bull Dog, English Bull Dog, French
## 3294 2538
## Bull Terrier Bulldog
## 817 973
## Bullmastiff Cairn Terrier
## 100 1764
## Canaan Dog Cane Corso
## 83 854
## Cardigan Welsh Corgi Catahoula Leopard Dog
## 213 542
## Cavalier King Charles Spaniel Cesky Terrier
## 5747 4
## Chesapeake Bay Retriever Chihuahua
## 100 21211
## Chihuahua Crossbreed Chinese Crested
## 7149 188
## Chinese Shar-Pei Chinook
## 187 20
## Chow Chow Cirneco Dell
## 891 12
## Clumber Spaniel Cockapoo
## 46 778
## Cocker Spaniel Cocker Spaniel Crossbreed
## 5676 1046
## Collie Collie Crossbreed
## 277 869
## Collie, Bearded Collie, Border
## 86 841
## Collie, Rough Coat Collie, Smooth Coat
## 146 141
## Coonhound, Black And Tan Coonhound, Blue Tick
## 389 107
## Coonhound, Redbone Coonhound, Treeing Walker
## 18 290
## Coton De Tulear Cotton De Tulear
## 537 706
## Curly-Coated Retriever Dachshund
## 13 3104
## Dachshund Crossbreed Dachshund Smooth Coat
## 1345 2496
## Dachshund Smooth Coat Miniature Dachshund, Long Haired
## 1888 714
## Dachshund, Long Haired Miniature Dachshund, Wirehaired
## 1131 249
## Dachshund, Wirehaired, Miniature Dalmatian
## 267 369
## Dalmatian Mix Dandie Dinmont Terrier
## 15 20
## Doberman Pinscher Dogo Argentino
## 1021 111
## Dogue De Bordeaux Dutch Shepherd
## 56 164
## English Cocker Spaniel English Foxhound
## 456 68
## English Setter English Setter Crossbreed
## 91 9
## English Springer Spaniel English Toy Spaniel
## 317 33
## Entlebucher Mountain Dog Estrela Mountain Dog
## 34 4
## Eurasier Field Spaniel
## 11 41
## Fila Brasileiro Finnish Lapphund
## 11 3
## Finnish Spitz Flat-Coated Retriever
## 44 144
## French Bulldog French Spaniel
## 6364 17
## German Longhaired Pointer German Pinscher
## 21 49
## German Shepherd Crossbreed German Shepherd Dog
## 5683 7129
## German Shorthaired Pointer German Spitz
## 364 51
## German Wirehaired Pointer Giant Schnauzer
## 41 58
## Glen Of Imaal Terrier Golden Retriever
## 47 7325
## Goldendoodle Gordon Setter
## 3849 32
## Grand Basset Griffon Vendeen Great Dane
## 4 546
## Great Pyrenees Greater Swiss Mountain Dog
## 496 49
## Greek Shephard Greyhound
## 2 860
## Hamiltonstovare Harrier
## 3 44
## Havanese Hovawart
## 8606 15
## Ibizan Hound Icelandic Sheepdog
## 47 11
## Irish Red And White Setter Irish Setter
## 14 150
## Irish Terrier Irish Water Spaniel
## 200 2
## Irish Wolfhound Italian Greyhound
## 60 1096
## Jack Russell Terrier Jack Russell Terrier Crossbreed
## 6880 697
## Jagdterrier Japanese Chin
## 3 203
## Japanese Chin/Spaniel Japanese Spitz
## 248 83
## Jindo Jindo Dog, Korea
## 299 937
## Kai Ken Karelian Bear Dog
## 10 2
## Keeshond Kerry Blue Terrier
## 123 65
## Kishu Ken Kooikerhondje
## 1 25
## Kromfohrlander Kuvasz
## 8 18
## Labradoodle Labrador (or Crossbreed)
## 3685 28399
## Lagotto Romagnolo Lakeland Terrier
## 194 38
## Lancashire Heeler Leonberger
## 10 26
## Lhasa Apso Lowchen
## 2486 31
## Maltese Maltese Crossbreed
## 15701 4498
## Maltipoo Manchester Terrier
## 2358 158
## Mastiff Mastiff, Bull
## 294 192
## Mastiff, French (Dogue De Bordeaux) Mastiff, Neapolitan
## 51 40
## Mastiff, Old English Mastiff, Tibetan
## 125 22
## Miniature American Shepherd Miniature Australian Shepherd
## 449 829
## Miniature Bull Terrier Miniature Fox Terrier
## 53 15
## Miniature Pinscher Miniature Schnauzer
## 3242 2298
## Morkie Mountain Feist
## 3049 13
## Mudi Neapolitan Mastiff
## 11 12
## Newfoundland Norfolk Terrier
## 263 328
## Norwegian Buhund Norwegian Elkhound
## 7 110
## Norwich Terrier Nova Scotia Duck Tolling Retriever
## 399 101
## Old English Sheepdog Olde English Bulldogge
## 294 11
## Otterhound Papillon
## 4 1430
## Parson Russell Terrier Patterdale Terrier
## 208 8
## Pekingese Pembroke Welsh Corgi
## 2043 1113
## Pembroke Welsh Corgi Crossbreed Perro De Presa Canario
## 130 24
## Peruvian Inca Orchid Petit Basset Griffon Vendeen
## 5 99
## Pharaoh Hound Pharoh Hound
## 102 51
## Pit Bull (or Mix) Plott
## 24393 491
## Pointer Pointer, German Shorthaired
## 861 227
## Pointer, German Wirehaired Polish Hound
## 59 19
## Polish Lowland Sheepdog Pomeranian
## 17 9287
## Pomeranian Crossbreed Pomsky
## 1355 381
## Poodle Poodle Crossbreed
## 4723 6178
## Poodle, Miniature Poodle, Standard
## 4567 3105
## Poodle, Toy Portuguese Podengo Pequeno
## 4923 58
## Portuguese Sheepdog Portuguese Water Dog
## 1 557
## Pug Pug Crossbreed
## 5203 534
## Puggle Puli
## 2889 72
## Pumi Pyrenean Shepherd
## 10 13
## Rat Terrier Rat Terrier Crossbreed
## 1181 240
## Redbone Coonhound Rhodesian Ridgeback
## 175 935
## Rottweiler Rottweiler Crossbreed
## 2126 339
## Russell Terrier Russian Toy
## 89 26
## Russian Wolfhound Saint Bernard
## 2 118
## Saluki Samoyed
## 48 437
## Schipperke Schipperkee
## 83 135
## Schnauzer Crossbreed Schnauzer, Giant
## 624 107
## Schnauzer, Miniature Schnauzer, Miniature Crossbreed
## 3398 310
## Schnauzer, Standard Scottish Deerhound
## 659 12
## Scottish Terrier Sealyham Terrier
## 561 31
## Shar-Pei, Chinese Shepard Crossbreed
## 704 771
## Shetland Sheepdog Shiba Inu
## 1070 6109
## Shiba Inu Crossbreed Shih Tzu
## 288 27407
## Shih Tzu Crossbreed Siberian Husky
## 8098 4999
## Siberian Husky Crossbreed Silky Terrier
## 770 943
## Skye Terrier Small Munsterlander Pointer
## 31 5
## Smooth Fox Terrier Soft Coated Wheaten Terrier
## 71 363
## Spanish Water Dog Spinone Italiano
## 27 66
## St. Bernard Stabyhoun
## 74 1
## Staffordshire Bull Terrier Standard Schnauzer
## 1230 31
## Sussex Spaniel Swedish Lapphund
## 20 1
## Swedish Vallhund Terrier Crossbreed
## 11 2369
## Terrier Mix Thai Ridgeback
## 5293 45
## Tibetan Mastiff Tibetan Spaniel
## 26 253
## Tibetan Terrier Tornjak
## 781 1
## Tosa Toy Fox Terrier
## 3 194
## Treeing Tennessee Brindle Treeing Walker Coonhound
## 34 200
## Unknown Vizsla
## 54586 877
## Weimaraner Welsh Corgi, Cardigan
## 559 310
## Welsh Corgi, Pembroke Welsh Springer Spaniel
## 1123 75
## Welsh Terrier West High White Terrier
## 118 1619
## West Highland White Terrier Wheaton Terrier
## 835 1539
## Whippet Wire Fox Terrier
## 600 583
## Wirehaired Pointing Griffon Wirehaired Vizsla
## 78 8
## Xoloitzcuintli Yorkshire Terrier
## 44 30379
## Yorkshire Terrier Crossbreed
## 4554
# 1.
q2b <- group_by(nyc_dogs,
breed_rc)
q2b <- summarise(q2b,
n = n())
q2b <- arrange(q2b,
-n) # or desc(n)
q2b# 2.
q2b <- group_by(nyc_dogs,
breed_rc)
q2b <- mutate(q2b,
n = n())
q2b <- arrange(q2b,
desc(n))
q2b <- select(q2b,
breed_rc, n)
q2b <- distinct(q2b)
q2b1 Unknown 54586 2 Yorkshire Terrier 30379 3 Labrador (or Crossbreed) 28399 4 Shih Tzu 27407 5 Pit Bull (or Mix) 24393 6 Chihuahua 21211
Describe the relationship between the five popular breeds and borough using ggplot. Make a simple comment on your ggplot figure.
q2c <- filter(nyc_dogs,
breed_rc %in% c("Yorkshire Terrier",
"Labrador (or Crossbreed)",
"Shih Tzu",
"Pit Bull (or Mix)",
"Chihuahua"),
!is.na(borough)
)
q2c <- filter(nyc_dogs,
breed_rc == "Yorkshire Terrier" |
breed_rc == "Labrador (or Crossbreed)" |
breed_rc == "Shih Tzu" |
breed_rc == "Pit Bull (or Mix)" |
breed_rc == "Chihuahua",
!is.na(borough)
)
# 1. stacked bar charts
# distribution of `breed_rc` by `borough`
ggplot(q2c) +
geom_bar(aes(x = breed_rc, fill = borough)) +
coord_flip()
# distribution of `borough` by `breed_rc`
ggplot(q2c) +
geom_bar(aes(fill = breed_rc, x = borough)) +
coord_flip()
# 2. stacked bar charts of relative frequencies
# distribution of `breed_rc` by `borough`
ggplot(q2c) +
geom_bar(aes(x = breed_rc, fill = borough),
position = "fill") +
coord_flip()
# distribution of `borough` by `breed_rc`
ggplot(q2c) +
geom_bar(aes(fill = breed_rc, x = borough),
position = "fill") +
coord_flip()
# 3. side-by-side bar charts
# distribution of `breed_rc` by `borough`
ggplot(q2c) +
geom_bar(aes(x = breed_rc, fill = borough),
position = "dodge2") +
coord_flip()
# distribution of `borough` by `breed_rc`
ggplot(q2c) +
geom_bar(aes(fill = breed_rc, x = borough),
position = "dodge2") +
coord_flip()
# 4. side-by-side bar charts of relative frequencies
# distribution of `breed_rc` by `borough`
ggplot(q2c) +
geom_bar(aes(x = breed_rc, fill = borough,
y = ..prop.., group = borough),
position = "dodge2") +
coord_flip()
# distribution of `borough` by `breed_rc`
ggplot(q2c) +
geom_bar(aes(x = borough, fill = breed_rc,
y = ..prop.., group = breed_rc),
position = "dodge2") +
coord_flip()
# What if the filtered data could not be obtained ...
ggplot(data = nyc_dogs)+
geom_bar(mapping = aes(x=borough))+
facet_wrap(~breed_rc, ncol = 4) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) # optional
Find the five most popular breeds for each borough in NYC.
# 1.
q2d <- group_by(nyc_dogs,
breed_rc, borough)
q2d <- summarize(q2d,
n = n())
q2d <- arrange(q2d,
borough, -n)
# 2.
q2d <- group_by(nyc_dogs,
breed_rc, borough)
q2d <- mutate(q2d,
n = n())
q2d <- arrange(q2d,
-n)
q2d <- select(q2d,
breed_rc, n, borough)
q2d <- distinct(q2d)
# filter by borough
q2d_bronx <- filter(q2d, borough == "Bronx")
q2d_bronxq2d_brooklyn <- filter(q2d, borough == "Brooklyn")
q2d_brooklynq2d_manhattan <- filter(q2d, borough == "Manhattan")
q2d_manhattanq2d_queens <- filter(q2d, borough == "Queens")
q2d_queensq2d_staten_island <- filter(q2d, borough == "Staten Island")
q2d_staten_island# 3.
q2d <- group_by(nyc_dogs,
breed_rc, borough)
q2d <- mutate(q2d,
n = n())
q2d <- group_by(q2d,
borough)
q2d <- mutate(q2d,
rank = dense_rank(desc(n)) )
q2d <- filter(q2d, rank <= 10 )
q2d <- select(q2d,
breed_rc, n, rank, borough)
q2d <- distinct(q2d)
q2d <- arrange(q2d,
borough,-n)
print(q2d, n = Inf)## # A tibble: 60 × 4
## # Groups: borough [6]
## breed_rc n rank borough
## <chr> <int> <int> <chr>
## 1 Unknown 5071 1 Bronx
## 2 Yorkshire Terrier 4956 2 Bronx
## 3 Pit Bull (or Mix) 4612 3 Bronx
## 4 Shih Tzu 4153 4 Bronx
## 5 Chihuahua 2934 5 Bronx
## 6 Labrador (or Crossbreed) 1913 6 Bronx
## 7 Maltese 1889 7 Bronx
## 8 Shih Tzu Crossbreed 1301 8 Bronx
## 9 Pomeranian 1031 9 Bronx
## 10 Chihuahua Crossbreed 958 10 Bronx
## 11 Unknown 14007 1 Brooklyn
## 12 Yorkshire Terrier 7928 2 Brooklyn
## 13 Labrador (or Crossbreed) 7660 3 Brooklyn
## 14 Shih Tzu 7463 4 Brooklyn
## 15 Pit Bull (or Mix) 7305 5 Brooklyn
## 16 Chihuahua 5439 6 Brooklyn
## 17 Maltese 3550 7 Brooklyn
## 18 Pomeranian 2440 8 Brooklyn
## 19 Shih Tzu Crossbreed 2129 9 Brooklyn
## 20 Chihuahua Crossbreed 2108 10 Brooklyn
## 21 Unknown 17421 1 Manhattan
## 22 Labrador (or Crossbreed) 10387 2 Manhattan
## 23 Yorkshire Terrier 7517 3 Manhattan
## 24 Chihuahua 7118 4 Manhattan
## 25 Shih Tzu 6482 5 Manhattan
## 26 Pit Bull (or Mix) 4919 6 Manhattan
## 27 Havanese 4727 7 Manhattan
## 28 Maltese 4269 8 Manhattan
## 29 French Bulldog 3565 9 Manhattan
## 30 Golden Retriever 3280 10 Manhattan
## 31 Unknown 11807 1 Queens
## 32 Yorkshire Terrier 6713 2 Queens
## 33 Shih Tzu 6039 3 Queens
## 34 Pit Bull (or Mix) 5108 4 Queens
## 35 Labrador (or Crossbreed) 4974 5 Queens
## 36 Maltese 4297 6 Queens
## 37 Chihuahua 4183 7 Queens
## 38 Pomeranian 2255 8 Queens
## 39 German Shepherd Dog 2111 9 Queens
## 40 Beagle 2080 10 Queens
## 41 Unknown 5894 1 Staten Island
## 42 Labrador (or Crossbreed) 3202 2 Staten Island
## 43 Shih Tzu 3058 3 Staten Island
## 44 Yorkshire Terrier 2987 4 Staten Island
## 45 Pit Bull (or Mix) 2289 5 Staten Island
## 46 Maltese 1588 6 Staten Island
## 47 Chihuahua 1375 7 Staten Island
## 48 German Shepherd Dog 1127 8 Staten Island
## 49 Beagle 799 9 Staten Island
## 50 Golden Retriever 739 10 Staten Island
## 51 Unknown 386 1 <NA>
## 52 Yorkshire Terrier 278 2 <NA>
## 53 Labrador (or Crossbreed) 263 3 <NA>
## 54 Shih Tzu 212 4 <NA>
## 55 Chihuahua 162 5 <NA>
## 56 Pit Bull (or Mix) 160 6 <NA>
## 57 French Bulldog 114 7 <NA>
## 58 Maltese 108 8 <NA>
## 59 Pomeranian 106 9 <NA>
## 60 Chihuahua Crossbreed 98 10 <NA>
Find the five most popular dog names for each gender in NYC.
# 1.
q2e <- group_by(nyc_dogs,
animal_name, animal_gender)
q2e <- summarise(q2e,
n = n())
q2e <- arrange(q2e,
animal_gender, -n)
# 2.
q2e <- group_by(nyc_dogs,
animal_name, animal_gender)
q2e <- mutate(q2e,
n = n())
q2e <- arrange(q2e,
-n)
q2e <- select(q2e,
animal_name, animal_gender, n)
q2e <- distinct(q2e)
# filter by animal_gender
q2e_F <- filter(q2e, animal_gender == "F")
q2e_Fq2e_M <- filter(q2e, animal_gender == "M")
q2e_MFind the five most popular dog names for each gender for each borough in NYC.
popular_name <- nyc_dogs %>%
group_by(animal_name, animal_gender, borough) %>%
mutate(n = n()) %>%
arrange(borough, animal_gender, -n) %>%
select(animal_gender, animal_name, n) %>%
distinct()
# 1.
q2f <- group_by(nyc_dogs,
animal_name, animal_gender, borough)
q2f <- summarise(q2f,
n = n())
q2f <- arrange(q2f,
animal_gender, -n)
# 2.
q2f <- group_by(nyc_dogs,
animal_name, animal_gender, borough)
q2f <- mutate(q2f,
n = n())
q2f <- arrange(q2f,
-n)
q2f <- select(q2f,
animal_name, animal_gender, borough, n)
q2f <- distinct(q2f)
# filter by animal_gender
q2f_F <- filter(q2f, animal_gender == "F")
q2f_M <- filter(q2f, animal_gender == "M")
# filter `q2f_F` by borough
q2f_F_bronx <- filter(q2f_F, borough == "Bronx")
q2f_F_bronxq2f_F_brooklyn <- filter(q2f_F, borough == "Brooklyn")
q2f_F_brooklynq2f_F_manhattan <- filter(q2f_F, borough == "Manhattan")
q2f_F_manhattanq2f_F_queens <- filter(q2f_F, borough == "Queens")
q2f_F_queensq2f_F_staten_island <- filter(q2f_F, borough == "Staten Island")
q2f_F_staten_island# filter `q2f_M` by borough
q2f_M_bronx <- filter(q2f_M, borough == "Bronx")
q2f_M_bronxq2f_M_brooklyn <- filter(q2f_M, borough == "Brooklyn")
q2f_M_brooklynq2f_M_manhattan <- filter(q2f_M, borough == "Manhattan")
q2f_M_manhattanq2f_M_queens <- filter(q2f_M, borough == "Queens")
q2f_M_queensq2f_M_staten_island <- filter(q2f_M, borough == "Staten Island")
q2f_M_staten_island# 3.
q2f <- group_by(nyc_dogs,
animal_name, animal_gender, borough)
q2f <- mutate(q2f,
n = n())
q2f <- group_by(q2f,
animal_gender, borough)
q2f <- mutate(q2f,
rank = dense_rank(desc(n)) )
q2f <- filter(q2f,
rank <= 10 )
q2f <- select(q2f,
animal_name, animal_gender, n, rank, borough)
q2f <- distinct(q2f)
q2f <- arrange(q2f,
borough, animal_gender, -n)
print(q2f, n = Inf)## # A tibble: 132 × 5
## # Groups: animal_gender, borough [15]
## animal_name animal_gender n rank borough
## <chr> <chr> <int> <int> <chr>
## 1 Bella F 774 1 Bronx
## 2 Princess F 490 2 Bronx
## 3 Luna F 436 3 Bronx
## 4 Lola F 375 4 Bronx
## 5 Coco F 322 5 Bronx
## 6 Mia F 295 6 Bronx
## 7 Daisy F 248 7 Bronx
## 8 Chloe F 228 8 Bronx
## 9 Lady F 201 9 Bronx
## 10 Name Not Provided F 179 10 Bronx
## 11 Max M 683 1 Bronx
## 12 Rocky M 495 2 Bronx
## 13 Lucky M 372 3 Bronx
## 14 Toby M 316 4 Bronx
## 15 Teddy M 312 5 Bronx
## 16 Prince M 295 6 Bronx
## 17 Name Not Provided M 289 7 Bronx
## 18 Charlie M 288 8 Bronx
## 19 Buddy M 282 9 Bronx
## 20 Milo M 249 10 Bronx
## 21 Unknown F 1480 1 Brooklyn
## 22 Bella F 1331 2 Brooklyn
## 23 Lola F 836 3 Brooklyn
## 24 Luna F 792 4 Brooklyn
## 25 Lucy F 733 5 Brooklyn
## 26 Daisy F 636 6 Brooklyn
## 27 Name F 635 7 Brooklyn
## 28 Princess F 567 8 Brooklyn
## 29 Chloe F 551 9 Brooklyn
## 30 Coco F 531 10 Brooklyn
## 31 Unknown M 1937 1 Brooklyn
## 32 Max M 1179 2 Brooklyn
## 33 Rocky M 881 3 Brooklyn
## 34 Name M 859 4 Brooklyn
## 35 Charlie M 835 5 Brooklyn
## 36 Name Not Provided M 668 6 Brooklyn
## 37 Lucky M 660 7 Brooklyn
## 38 Buddy M 627 8 Brooklyn
## 39 Teddy M 533 9 Brooklyn
## 40 Milo M 533 9 Brooklyn
## 41 Prince M 489 10 Brooklyn
## 42 Cherry <NA> 5 1 Brooklyn
## 43 Shebah <NA> 4 2 Brooklyn
## 44 Sparkey <NA> 3 3 Brooklyn
## 45 Unknown F 1555 1 Manhattan
## 46 Lucy F 1358 2 Manhattan
## 47 Lola F 1317 3 Manhattan
## 48 Bella F 1262 4 Manhattan
## 49 Luna F 1025 5 Manhattan
## 50 Daisy F 884 6 Manhattan
## 51 Coco F 783 7 Manhattan
## 52 Stella F 698 8 Manhattan
## 53 Penny F 685 9 Manhattan
## 54 Sophie F 678 10 Manhattan
## 55 Unknown M 1888 1 Manhattan
## 56 Charlie M 1290 2 Manhattan
## 57 Max M 1186 3 Manhattan
## 58 Oliver M 836 4 Manhattan
## 59 Teddy M 798 5 Manhattan
## 60 Cooper M 768 6 Manhattan
## 61 Buddy M 729 7 Manhattan
## 62 Leo M 665 8 Manhattan
## 63 Rocky M 650 9 Manhattan
## 64 Jack M 629 10 Manhattan
## 65 Milo M 629 10 Manhattan
## 66 Bella F 1338 1 Queens
## 67 Name Not Provided F 780 2 Queens
## 68 Luna F 678 3 Queens
## 69 Coco F 605 4 Queens
## 70 Lola F 580 5 Queens
## 71 Daisy F 569 6 Queens
## 72 Princess F 549 7 Queens
## 73 Molly F 465 8 Queens
## 74 Lucy F 455 9 Queens
## 75 Unknown F 429 10 Queens
## 76 Name Not Provided M 1595 1 Queens
## 77 Max M 1258 2 Queens
## 78 Unknown M 1012 3 Queens
## 79 Rocky M 875 4 Queens
## 80 Charlie M 733 5 Queens
## 81 Buddy M 668 6 Queens
## 82 Lucky M 649 7 Queens
## 83 Toby M 562 8 Queens
## 84 Milo M 541 9 Queens
## 85 Teddy M 536 10 Queens
## 86 Nanuk <NA> 3 1 Queens
## 87 Sidney <NA> 3 1 Queens
## 88 Bella F 742 1 Staten Island
## 89 Molly F 352 2 Staten Island
## 90 Coco F 328 3 Staten Island
## 91 Daisy F 328 3 Staten Island
## 92 Lola F 315 4 Staten Island
## 93 Lucy F 247 5 Staten Island
## 94 Luna F 238 6 Staten Island
## 95 Chloe F 234 7 Staten Island
## 96 Mia F 221 8 Staten Island
## 97 Unknown F 207 9 Staten Island
## 98 Zoey F 206 10 Staten Island
## 99 Max M 523 1 Staten Island
## 100 Rocky M 451 2 Staten Island
## 101 Buddy M 421 3 Staten Island
## 102 Charlie M 311 4 Staten Island
## 103 Unknown M 262 5 Staten Island
## 104 Teddy M 256 6 Staten Island
## 105 Name Not Provided M 230 7 Staten Island
## 106 Bailey M 205 8 Staten Island
## 107 Cody M 188 9 Staten Island
## 108 Toby M 185 10 Staten Island
## 109 Buddy <NA> 4 1 Staten Island
## 110 Unknown F 80 1 <NA>
## 111 Lola F 55 2 <NA>
## 112 Bella F 46 3 <NA>
## 113 Luna F 35 4 <NA>
## 114 Lucy F 33 5 <NA>
## 115 Penny F 32 6 <NA>
## 116 Coco F 27 7 <NA>
## 117 Name Not Provided F 20 8 <NA>
## 118 Lady F 19 9 <NA>
## 119 Mia F 17 10 <NA>
## 120 Unknown M 113 1 <NA>
## 121 Name Not Provided M 33 2 <NA>
## 122 Max M 31 3 <NA>
## 123 Charlie M 28 4 <NA>
## 124 Name M 24 5 <NA>
## 125 Henry M 21 6 <NA>
## 126 Toby M 20 7 <NA>
## 127 Teddy M 18 8 <NA>
## 128 Oliver M 17 9 <NA>
## 129 Cooper M 15 10 <NA>
## 130 Rocky M 15 10 <NA>
## 131 Dexter M 15 10 <NA>
## 132 Milo M 15 10 <NA>
Assume that all dogs in the nyc_dogs data frame are
alive as of today.
Describe the distribution of age for each borough using ggplot. Make a simple comment on your ggplot.
nyc_dogs_age <- nyc_dogs %>%
mutate( age = 2022 - animal_birth_year )
nyc_dogs_age <- mutate(nyc_dogs,
age = 2022 - animal_birth_year )
ggplot( nyc_dogs_age ) +
geom_density(aes(x = age)) +
facet_wrap(~borough)
ggplot( filter(nyc_dogs_age,
age <= 25, !is.na(borough)) ) +
geom_density(aes(x = age)) +
facet_wrap(~borough)
ggplot( filter(nyc_dogs_age, age <= 25, !is.na(borough)) ) +
geom_histogram(aes(x = age),
binwidth = 1) +
facet_wrap(~borough)
Add some comments here ….
For Question 3, run the following function to read the NYC’s Citywide Payroll Data.
nyc_payroll <- read_csv(
'https://bcdanl.github.io/data/nyc_payroll.csv'
)Description of variables in the nyc_payroll dataset is provided at the end of the R script.
Create a variable, payroll, which is defined as:
\[ \texttt{payroll} =
\texttt{regular_gross_paid} + \texttt{total_ot_paid}\] where
regular_gross_paid and total_ot_paid are
variables in the nyc_payroll data frame.
q3a <- mutate(nyc_payroll,
payroll = regular_gross_paid + total_ot_paid)Calculate the mean of payroll by
title_description.
q3b <- group_by(q3a,
title_description)
q3b <- summarise(q3b,
avg_payroll = mean(payroll, na.rm = T))
q3b <- arrange(q3b,
-avg_payroll)Calculate the mean of payroll by
work_location_borough.
q3c <- group_by(q3a,
work_location_borough)
q3c <- summarise(q3c,
avg_payroll = mean(payroll, na.rm = T))
q3c <- arrange(q3c,
-avg_payroll)Fiscal Year: Fiscal Year
Payroll Number: Payroll Number
Agency Name: The Payroll agency that the employee works for
Last Name: Last name of employee
First Name: First name of employee
Mid Init: Middle initial of employee
Agency Start Date: Date which employee began working for their current agency Date & Time
Work Location Borough: Borough of employee’s primary work location
Title Description: Civil service title description of the employee
Leave Status as of June 30: Status of employee as of the close of the relevant fiscal year: Active, Ceased, or On Leave
Base Salary: Base Salary assigned to the employee
Pay Basis: Lists whether the employee is paid on an hourly, per diem or annual basis
Regular Hours: Number of regular hours employee worked in the fiscal year
Regular Gross Paid: The amount paid to the employee for base salary during the fiscal year
OT Hours: Overtime Hours worked by employee in the fiscal year
Total OT Paid: Total overtime pay paid to the employee in the fiscal year
Total Other Pay: Includes any compensation in addition to gross salary and overtime pay, ie Differentials, lump sums, uniform allowance, meal allowance, retroactive pay increases, settlement amounts, and bonus pay, if applicable.