Loading R packages for Homework Assignment 2

library(tidyverse)



Question 1

Read the data file, NY_school_enrollment_socioecon.csv, as the data.frame object with the name, NY_school_enrollment_socioecon, using (1) the read_csv() function and (2) its URL, https://bcdanl.github.io/data/NY_school_enrollment_socioecon.csv.

url <- 'https://bcdanl.github.io/data/NY_school_enrollment_socioecon.csv'
NY_school_enrollment_socioecon <- read_csv(url)



For description of variables in NY_school_enrollment_socioecon, refer to the file, ny_school_enrollment_socioecon_description.zip, which is in the Files section in our Canvas web-page. (I recommend you to extract the zip file, and then read the file, ny_school_enrollment_socioecon_description.csv, using Excel or Numbers.)



Q1a

Look up the meaning of variables d01_018, d01_021, d01_024, and d01_027, from the file, ny_school_enrollment_socioecon_description.csv.

  • Find the top 5 counties in terms of the value of variable d01_018 for each year.

  • Find the top 5 counties in terms of the value of variable d01_021 for each year.

  • Find the top 5 counties in terms of the value of variable d01_024 for each year.

  • Find the top 5 counties in terms of the value of variable d01_027 for each year.

  • Make a comment on the result.



Q1b

Look up the meaning of variables c02_010, c04_010, and c06_010 from the file, ny_school_enrollment_socioecon_description.csv.

  • Find the top 5 counties in terms of the value of variable c02_010 for each year.

  • Find the top 5 counties in terms of the value of variable c04_010 for each year.

  • Find the top 5 counties in terms of the value of variable c06_010 for each year.

  • Make a comment on the result.



Q1c

Look up the meaning of variable pincp from the file, ny_school_enrollment_socioecon_description.csv.

Provide both (1) one ggplot code with geom_line() and (2) a couple of sentences to describe the yearly trend of pincp for Livingston county and top 4 counties in terms of the mean value of pincp over the years, 2015-2020.



Q1d

Create the variable of the growth rate of pincp, where the growth rate of pincp is defined as:

\[ (\text{Growth rate of } \texttt{pincp} \text{ in year } \texttt{Y}) = \frac{( \texttt{pincp} \text{ in year } \texttt{Y} ) - ( \texttt{pincp} \text{ in year } \texttt{Y-1} )}{(\texttt{pincp} \text{ in year } \texttt{Y-1} )} \] Provide both (1) one ggplot code with geom_line() and (2) a couple of sentences to describe the yearly trend of the growth rate of pincp for Livingston county and top 5 counties in terms of the mean value of the growth rate of pincp over the years, 2016-2020.



Q1e

Look up the meaning of variable poverty_hs and poverty_bachelor_or_higher from the file, ny_school_enrollment_socioecon_description.csv.

  • Summarize the mean value of poverty_hs and poverty_bachelor_or_higher from years 2015 to 2020 for each county.

  • Provide both (1) a ggplot code with geom_point() and (2) a couple of sentences to describe the mean value of poverty_hs.

  • Provide both (1) a ggplot code with geom_point() and (2) a couple of sentences to describe the mean value of poverty_bachelor_or_higher.




Question 2

Read the data file, beer_markets.csv, as the data.frame object with the name, beer_markets, using (1) the read_csv() function and (2) its URL, https://bcdanl.github.io/data/beer_markets.csv.

beer_markets <- read_csv(
  'https://bcdanl.github.io/data/beer_markets.csv'
)



Description of variables in the data file, beer_markets.csv

Each observation in beer_markets.csv is a household-level record for one transaction of beer.


  • hh: an identifier of the household;
  • X_purchase_desc: details on the purchased item;
  • quantity: the number of items purchased;
  • brand: Bud Light, Busch Light, Coors Light, Miller Lite, or Natural Light;
  • spent: total dollar value of purchase;
  • beer_floz: total volume of beer, in fluid ounces;
  • price_per_floz: price per fl.oz. (i.e., beer spent/beer floz);
  • container: the type of container;
  • promo: Whether the item was promoted (coupon or otherwise);
  • market: Scan-track market (or state if rural);
  • demographic data, including gender, marital status, household income, class of work, race, education, age, the size of household, and whether or not the household has a microwave or a dishwasher.



Q2a

  • Find the top 5 markets in terms of the total volume of beer.
  • Find the top 5 markets in terms of the total volume of BUD LIGHT.
  • Find the top 5 markets in terms of the total volume of BUSCH LIGHT.
  • Find the top 5 markets in terms of the total volume of COORS LIGHT.
  • Find the top 5 markets in terms of the total volume of MILLER LITE.
  • Find the top 5 markets in terms of the total volume of NATURAL LIGHT.



Q2b

  • For households that purchased BUD LIGHT, what fraction of households did purchase only BUD LIGHT?

  • For households that purchased BUSCH LIGHT, what fraction of households did purchase only BUSCH LIGHT?

  • For households that purchased COORS LIGHT, what fraction of households did purchase only COORS LIGHT?

  • For households that purchased MILLER LITE, what fraction of households did purchase only MILLER LITE?

  • For households that purchased NATURAL LIGHT, what fraction of households did purchase only NATURAL LIGHT?

  • Which beer brand does have the largest base of loyal consumers?



Q2c

  • Calculate the number of beer transactions for each household.
  • Calculate the fraction of each beer brand for each household.