library(tidyverse)The following data is for Class Exercises:
NY_CC_tweets <- read_csv(
'https://bcdanl.github.io/data/NY_tweets_cc_no_contents.csv')id_user: a unique identification number for a Twitter
user whom retweeted to a tweet with #climatechange or
#globalwarming.id_city: a unique identification number for a
city.FIPS: a unique identification number for a county.Each row represents an observation of a retweet to a tweet with #climatechange or #globalwarming.
Each row includes a Twitter user’s geographic information at city or
county levels (variables FIPS, county,
city) as well as information about timing when a Twitter
user retweeted (variables year, month,
day, hour, minute,
second).
How many Twitter users retweeted on the date, January 1, 2017?
Which city is with the third highest number of retweets on the date, December 1, 2017?
For each year, find the top 5 Twitter users in NY state in terms of the number of retweets they made in NY state. In which city and county do these users live in?
Summarize the data set into the data frame with county and month levels of retweets with the following variables:
FIPS, county, year,
month;n_retweets: the number of retweets in year
YYYY and month MM from
county C.The unique() or distinct() functions can be
used to keep only unique/distinct rows from a data frame.
Describe the relationship between the number of retweets and
county using ggplot. Make a simple comment on your
plot.