Loading R packages for Class Exercises

library(tidyverse)



Tweets with #climatechange

The following data is for Class Exercises:

NY_CC_tweets <- read_csv(
  'https://bcdanl.github.io/data/NY_tweets_cc_no_contents.csv')

Variable Description

  • id_user: a unique identification number for a Twitter user whom retweeted to a tweet with #climatechange or #globalwarming.
  • id_city: a unique identification number for a city.
  • FIPS: a unique identification number for a county.

Each row represents an observation of a retweet to a tweet with #climatechange or #globalwarming.

Each row includes a Twitter user’s geographic information at city or county levels (variables FIPS, county, city) as well as information about timing when a Twitter user retweeted (variables year, month, day, hour, minute, second).

Q1a

How many Twitter users retweeted on the date, January 1, 2017?



Q1b

Which city is with the third highest number of retweets on the date, December 1, 2017?



Q1c

For each year, find the top 5 Twitter users in NY state in terms of the number of retweets they made in NY state. In which city and county do these users live in?



Q1d

Summarize the data set into the data frame with county and month levels of retweets with the following variables:

  • FIPS, county, year, month;
  • n_retweets: the number of retweets in year YYYY and month MM from county C.

The unique() or distinct() functions can be used to keep only unique/distinct rows from a data frame.



Q1e

Describe the relationship between the number of retweets and county using ggplot. Make a simple comment on your plot.