Student Course Experience (SCE) Survey
Effective Fall 2022, the Student Course Experience (SCE) survey replaces the Student Observation of Faculty Instruction (SOFI) survey.
In a web browser, students should visit their myGeneseo portal, then select KnightWeb, Surveys, then SCE (formerly SOFI) Surveys.
Final Exam
The Final Exam is scheduled on Wednesday, December 14, noon - 2 P.M.
The Final Exam covers:
def
)read_csv()
seaborn
and R ggplot2
R programming
The R language is available as a free download from the R Project website at:
RStudio
RStudio offers a graphical interface to assist in creating R code:
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
R Packages
pkgs <- c("ggplot2", "readr", "dplyr")install.packages(pkgs)
Mac: "Do you want to install from sources the packages which need compilation?" from Console Pane.
Windows: "Would you like to use a personal library instead?" from Pop-up message.
R Packages
ggplot2
is installed well:library(ggplot2) # loading the package tidyversempg # data.frame provided by the package ggplot2 # ggplot2 is included in tidyverse
Shortcuts for RStudio and RScript
Mac
<-
.Windows
<-
.Ctrl (command for Mac Users) + F is useful when finding a phrase (and replace the phrase) in the RScript.
Auto-completion of command is useful.
libr
in the RScript in RStudio and wait for a second.libr
PACKAGE
, use install.packages("PACKAGE")
.install.packages("ggplot2") # installing package "ggplot2"
Quotation marks, parentheses, and +
+
:> x <- "hello
+
tells you that R is waiting for more input; it doesn’t think you’re done yet. Assignment
<-
, =
, ->
).<-
.x <- 2x < - 3print(x)x <- 5x = 55 -> x
R variables and data types
Variables can be thought of as a labelled container used to store information.
Variables allow us to recall saved information to later use in calculations.
Variables can store many different things in RStudio, from single values, data frames, to graphs.
R variables and data types
myname <- "my_name"class(myname)
class()
function returns the data type of an object.favourite.integer <- as.integer(2)print(favourite.integer)class(favourite.integer)favourite.numeric <- as.numeric(8.8)print(favourite.numeric)class(favourite.numeric)pvalue.threshold <- 0.05
==
to test for equality in Rclass(TRUE)favourite.numeric == 8.8favourite.numeric == 9.9
1:102*(1:10)seq(0, 10, 2)myvector <- 1:10myvectorb <- c(3,4,5)b^2beers <- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT", "MILLER LITE", "NATURAL LIGHT")beers
Factors store categorical data.
Under the hood, factors are actually integers that have a string label attached to each unique integer.
beers <- as.factor(beers)class(beers)levels(beers)nlevels(beers)
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
Step 0. Download the zip file, 'car_data.zip' from the Files section in our Canvas.
Step 1. Find the path name for the file, car.data.csv
.
Step 2. In the code below, replace 'PATH_NAME_FOR_THE_FILE_car.data.csv' with the path name for the file, car.data.csv
.
Step 3. Run the following R code:
# install.packages("readr")library(readr)uciCar <- read_csv( 'PATH_NAME_FOR_THE_FILE_car.data.csv')View(uciCar)
Examining data frame
class()
tells you what kind of R object you have. dim()
shows how many rows and columns are in the data for data.frame
.head()
shows the top few rows of the data.help()
provides the documentation for a class. help(class(uciCar))
.str()
gives us the structure for an object.Examining data frame
summary()
provides a summary of almost any R object. skimr::skim()
provides a more detailed summary.skimr
is the package that provides the function skim()
.print()
prints all the data. View()
displays the data in a simple spreadsheet-like grid viewer.dplyr::glimpse()
displays brief information about the data.Examining data frame
print(uciCar)class(uciCar)dim(uciCar)head(uciCar)help(class(uciCar))str(uciCar)summary(uciCar)library(skimr)skim(uciCar)library(tidyverse)glimpse(uciCar)
Reading data from an URL
# install.packages("readr")# library(readr)tvshows <- read_csv( 'https://bcdanl.github.io/data/tvshows.csv')
Accessing Subsets
head()
returns the first N rows of our data frame.tail()
returns the last N rows of our data frame.head(tvshows, n = 3)head(tvshows, 3)tail(tvshows, 3)
tvshows[ 1:3, ]tvshows[ c(1, 2, 3), ]tvshows[ c(1, 2, 3), 1]
tvshows$Networktvshows[, 2]tvshows[, "Network"]
tvshows[ , c("Show", "GRP")]
tvshows[1:3, c(2,5)]
Genre
is Reality
.tvshows[ tvshows$Genre == "Reality", ]
which()
function. - This returns the TRUE indices of a logical object.reality <- which(tvshows$Genre == "Reality")realitytvshows[ reality, ]
tvshows[tvshows$PE > 80, ]
which()
function. - This returns the TRUE indices of a logical object.reality <- which(tvshows$Genre == "Reality")realitytvshows[ reality, ]
Class Exercises 2
Return those shows whose Duration
values are 30
.
Return those shows whose GRP
values are greater than the mean value of GRP
.
Return the data.frame with only three variables---Show
, PE
, and GRP
---for which PE
values are greater than the mean value of PE
.
ggplot2
ggplot2
ggplot2
is a R data visualization package based on The Grammar of Graphics. ggplot2
is the most elegant and most versatile visualization tools in R.ggplot2
how to map variables to aes
thetics, what graphical primitives to use, and it takes care of the details.library(ggplot2)
ggplot2
Types of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Scatter plot with Fitted line
Line chart
ggplot2
Getting started with ggplot2
titanic
and tips
data.frames:df_titanic <- read_csv('https://bcdanl.github.io/data/titanic_cleaned.csv')df_tips <- read_csv('https://bcdanl.github.io/data/tips_seaborn.csv')
ggplot2
Bar Chart
geom_bar()
function to plot a bar chart:ggplot( data = df_titanic ) + geom_bar( aes(x = sex) )
data
: data.framex
: Name of a categorical variable (column) in data.frameggplot2
Bar Chart
We can further break up the bars in the bar chart based on another categorical variable.
ggplot( data = df_titanic ) + geom_bar( aes( x = sex, fill = survived ) )
fill
: Name of a categorical variableggplot2
Histogram
geom_histogram()
function to plot a histogram:ggplot(data = df_titanic) +geom_histogram( aes( x = age ), bins = 5 )
bins
: Number of binsggplot2
Scatter plot
A scatter plot is used to display the relationship between two continuous variables.
We use geom_point()
function to plot a scatter plot:
ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip ) )
x
: Name of a continuous variable on the horizontal axisy
: Name of a continuous variable on the vertical axisggplot2
Scatter plot
To the scatter plot, we can add a color
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Suppose we are interested in the following question:
ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip, color = smoker ) )
ggplot2
Fitted line
geom_smooth( method = lm )
adds a line that fits well into the scattered points.ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip, color = smoker ) ) + geom_smooth( aes( x = total_bill, y = tip, color = smoker ), method = lm )
ggplot2
Line cahrt
A line chart is used to display the trend in a continuous variable or the change in a continuous variable over other variable.
It draws a line by connecting the scattered points in order of the variable on the x-axis, so that it highlights exactly when changes occur.
We use geom_line()
function to plot a line plot:
path_csv <- 'THE_PATHNAME_FOR_THE_FILE__dji.csv'dow <- read_csv(path_csv)ggplot( data = dow ) + geom_line( aes( x = Date, y = Close ) )
Student Course Experience (SCE) Survey
Effective Fall 2022, the Student Course Experience (SCE) survey replaces the Student Observation of Faculty Instruction (SOFI) survey.
In a web browser, students should visit their myGeneseo portal, then select KnightWeb, Surveys, then SCE (formerly SOFI) Surveys.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Student Course Experience (SCE) Survey
Effective Fall 2022, the Student Course Experience (SCE) survey replaces the Student Observation of Faculty Instruction (SOFI) survey.
In a web browser, students should visit their myGeneseo portal, then select KnightWeb, Surveys, then SCE (formerly SOFI) Surveys.
Final Exam
The Final Exam is scheduled on Wednesday, December 14, noon - 2 P.M.
The Final Exam covers:
def
)read_csv()
seaborn
and R ggplot2
R programming
The R language is available as a free download from the R Project website at:
RStudio
RStudio offers a graphical interface to assist in creating R code:
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
R Packages
pkgs <- c("ggplot2", "readr", "dplyr")install.packages(pkgs)
Mac: "Do you want to install from sources the packages which need compilation?" from Console Pane.
Windows: "Would you like to use a personal library instead?" from Pop-up message.
R Packages
ggplot2
is installed well:library(ggplot2) # loading the package tidyversempg # data.frame provided by the package ggplot2 # ggplot2 is included in tidyverse
Shortcuts for RStudio and RScript
Mac
<-
.Windows
<-
.Ctrl (command for Mac Users) + F is useful when finding a phrase (and replace the phrase) in the RScript.
Auto-completion of command is useful.
libr
in the RScript in RStudio and wait for a second.libr
PACKAGE
, use install.packages("PACKAGE")
.install.packages("ggplot2") # installing package "ggplot2"
Quotation marks, parentheses, and +
+
:> x <- "hello
+
tells you that R is waiting for more input; it doesn’t think you’re done yet. Assignment
<-
, =
, ->
).<-
.x <- 2x < - 3print(x)x <- 5x = 55 -> x
R variables and data types
Variables can be thought of as a labelled container used to store information.
Variables allow us to recall saved information to later use in calculations.
Variables can store many different things in RStudio, from single values, data frames, to graphs.
R variables and data types
myname <- "my_name"class(myname)
class()
function returns the data type of an object.favourite.integer <- as.integer(2)print(favourite.integer)class(favourite.integer)favourite.numeric <- as.numeric(8.8)print(favourite.numeric)class(favourite.numeric)pvalue.threshold <- 0.05
==
to test for equality in Rclass(TRUE)favourite.numeric == 8.8favourite.numeric == 9.9
1:102*(1:10)seq(0, 10, 2)myvector <- 1:10myvectorb <- c(3,4,5)b^2beers <- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT", "MILLER LITE", "NATURAL LIGHT")beers
Factors store categorical data.
Under the hood, factors are actually integers that have a string label attached to each unique integer.
beers <- as.factor(beers)class(beers)levels(beers)nlevels(beers)
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
Step 0. Download the zip file, 'car_data.zip' from the Files section in our Canvas.
Step 1. Find the path name for the file, car.data.csv
.
Step 2. In the code below, replace 'PATH_NAME_FOR_THE_FILE_car.data.csv' with the path name for the file, car.data.csv
.
Step 3. Run the following R code:
# install.packages("readr")library(readr)uciCar <- read_csv( 'PATH_NAME_FOR_THE_FILE_car.data.csv')View(uciCar)
Examining data frame
class()
tells you what kind of R object you have. dim()
shows how many rows and columns are in the data for data.frame
.head()
shows the top few rows of the data.help()
provides the documentation for a class. help(class(uciCar))
.str()
gives us the structure for an object.Examining data frame
summary()
provides a summary of almost any R object. skimr::skim()
provides a more detailed summary.skimr
is the package that provides the function skim()
.print()
prints all the data. View()
displays the data in a simple spreadsheet-like grid viewer.dplyr::glimpse()
displays brief information about the data.Examining data frame
print(uciCar)class(uciCar)dim(uciCar)head(uciCar)help(class(uciCar))str(uciCar)summary(uciCar)library(skimr)skim(uciCar)library(tidyverse)glimpse(uciCar)
Reading data from an URL
# install.packages("readr")# library(readr)tvshows <- read_csv( 'https://bcdanl.github.io/data/tvshows.csv')
Accessing Subsets
head()
returns the first N rows of our data frame.tail()
returns the last N rows of our data frame.head(tvshows, n = 3)head(tvshows, 3)tail(tvshows, 3)
tvshows[ 1:3, ]tvshows[ c(1, 2, 3), ]tvshows[ c(1, 2, 3), 1]
tvshows$Networktvshows[, 2]tvshows[, "Network"]
tvshows[ , c("Show", "GRP")]
tvshows[1:3, c(2,5)]
Genre
is Reality
.tvshows[ tvshows$Genre == "Reality", ]
which()
function. - This returns the TRUE indices of a logical object.reality <- which(tvshows$Genre == "Reality")realitytvshows[ reality, ]
tvshows[tvshows$PE > 80, ]
which()
function. - This returns the TRUE indices of a logical object.reality <- which(tvshows$Genre == "Reality")realitytvshows[ reality, ]
Class Exercises 2
Return those shows whose Duration
values are 30
.
Return those shows whose GRP
values are greater than the mean value of GRP
.
Return the data.frame with only three variables---Show
, PE
, and GRP
---for which PE
values are greater than the mean value of PE
.
ggplot2
ggplot2
ggplot2
is a R data visualization package based on The Grammar of Graphics. ggplot2
is the most elegant and most versatile visualization tools in R.ggplot2
how to map variables to aes
thetics, what graphical primitives to use, and it takes care of the details.library(ggplot2)
ggplot2
Types of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Scatter plot with Fitted line
Line chart
ggplot2
Getting started with ggplot2
titanic
and tips
data.frames:df_titanic <- read_csv('https://bcdanl.github.io/data/titanic_cleaned.csv')df_tips <- read_csv('https://bcdanl.github.io/data/tips_seaborn.csv')
ggplot2
Bar Chart
geom_bar()
function to plot a bar chart:ggplot( data = df_titanic ) + geom_bar( aes(x = sex) )
data
: data.framex
: Name of a categorical variable (column) in data.frameggplot2
Bar Chart
We can further break up the bars in the bar chart based on another categorical variable.
ggplot( data = df_titanic ) + geom_bar( aes( x = sex, fill = survived ) )
fill
: Name of a categorical variableggplot2
Histogram
geom_histogram()
function to plot a histogram:ggplot(data = df_titanic) +geom_histogram( aes( x = age ), bins = 5 )
bins
: Number of binsggplot2
Scatter plot
A scatter plot is used to display the relationship between two continuous variables.
We use geom_point()
function to plot a scatter plot:
ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip ) )
x
: Name of a continuous variable on the horizontal axisy
: Name of a continuous variable on the vertical axisggplot2
Scatter plot
To the scatter plot, we can add a color
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Suppose we are interested in the following question:
ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip, color = smoker ) )
ggplot2
Fitted line
geom_smooth( method = lm )
adds a line that fits well into the scattered points.ggplot( data = df_tips ) + geom_point( aes( x = total_bill, y = tip, color = smoker ) ) + geom_smooth( aes( x = total_bill, y = tip, color = smoker ), method = lm )
ggplot2
Line cahrt
A line chart is used to display the trend in a continuous variable or the change in a continuous variable over other variable.
It draws a line by connecting the scattered points in order of the variable on the x-axis, so that it highlights exactly when changes occur.
We use geom_line()
function to plot a line plot:
path_csv <- 'THE_PATHNAME_FOR_THE_FILE__dji.csv'dow <- read_csv(path_csv)ggplot( data = dow ) + geom_line( aes( x = Date, y = Close ) )