library(tidyverse)Download dominick_oj_q1a.csv from the Midterm Exam in
the Assignments or the Files sections in our Canvas.
Then import the dominick_oj_q1a.csv using the following
lines:
oj_q1a <- read_csv('ABSOLUTE_PATH_NAME_FOR_THE_FILE_dominick_oj_q1a.csv')
table(oj_q1a$brand)You need to provide the absolute path name for the file,
dominick_oj_q1a.csv to the above read_csv()
function to read the file.
sales: the number of orange juice (OJ) cartons sold
in a week
price: price of OJ carton
brand: OJ brand
feat: Advertisement status— 1 if advertised; 0 if
not advertised.
Report (1) minimum, (2) median, (3) maximum, (4) mean, and (5)
standard deviation of variable price for the
brand, Dominick’s OJ.
For Question 1b, run the following function to read the
dominick_oj.csv file:
oj_q1b <- read_csv(
'https://bcdanl.github.io/data/dominick_oj.csv'
)The description of variables in oj_q1b is the same as
oj_q1a.
price and
the log of sales by brand using ggplot. Make a
simple comment on your ggplot figure.For Question 2, run the following R command to read the
nyc_dogs.csv file.
nyc_dogs <- read_csv('https://bcdanl.github.io/data/nyc_dogs.csv')Describe the distribution of animal_gender using ggplot. Make a simple comment on your ggplot figure.
Find the five most popular breeds in NYC.
Describe the relationship between the five popular breeds and borough using ggplot. Make a simple comment on your ggplot figure.
Find the five most popular breeds for each borough in NYC.
Find the five most popular dog names for each gender in NYC.
Find the five most popular dog names for each gender for each borough in NYC.
Assume that all dogs in the nyc_dogs data frame are
alive as of today.
Describe the distribution of age for each borough using ggplot. Make a simple comment on your ggplot.
For Question 3, run the following function to read the NYC’s Citywide Payroll Data.
nyc_payroll <- read_csv(
'https://bcdanl.github.io/data/nyc_payroll.csv'
)Description of variables in the nyc_payroll dataset is provided at the end of the R script.
Create a variable, payroll, which is defined as:
\[ \texttt{payroll} =
\texttt{regular_gross_paid} + \texttt{total_ot_paid}\] where
regular_gross_paid and total_ot_paid are
variables in the nyc_payroll data frame.
Calculate the mean of payroll by
title_description.
Calculate the mean of payroll by
work_location_borough.
Fiscal Year: Fiscal Year
Payroll Number: Payroll Number
Agency Name: The Payroll agency that the employee works for
Last Name: Last name of employee
First Name: First name of employee
Mid Init: Middle initial of employee
Agency Start Date: Date which employee began working for their current agency Date & Time
Work Location Borough: Borough of employee’s primary work location
Title Description: Civil service title description of the employee
Leave Status as of June 30: Status of employee as of the close of the relevant fiscal year: Active, Ceased, or On Leave
Base Salary: Base Salary assigned to the employee
Pay Basis: Lists whether the employee is paid on an hourly, per diem or annual basis
Regular Hours: Number of regular hours employee worked in the fiscal year
Regular Gross Paid: The amount paid to the employee for base salary during the fiscal year
OT Hours: Overtime Hours worked by employee in the fiscal year
Total OT Paid: Total overtime pay paid to the employee in the fiscal year
Total Other Pay: Includes any compensation in addition to gross salary and overtime pay, ie Differentials, lump sums, uniform allowance, meal allowance, retroactive pay increases, settlement amounts, and bonus pay, if applicable.